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ABSTRACT 


GRADES AND DATA DRIVEN DECISION MAKING: ISSUES OE VARIANCE AND 

STUDENT PATTERNS 

By 

Alex Jon Bowers 

This study addresses the question: To what extent are teacher assigned subject- 
specific grades useful for data driven decision making in schools! Reeently, schools 
have been urged to bring teachers and school leaders together around student-level data 
in an effort to increase dialogue, collaboration and professional communities to improve 
educational practice through data driven decision making. However, schools are 
inundated with data. While much attention has been paid to the use and reporting of 
standardized test scores in policy, school and district-level data driven decision making, 
much of the industry of schools is devoted to the generation and reporting of grades. 
Historically, little attention has been paid to student grades and grade patterns and their 
use in predicting student performance, standardized assessment scores and on-time 
graduation. This study analyzed the entire K-12 subject-specific grading and assessment 
histories of two cohorts in two separate school districts through correlations and a novel 
application of cluster analysis. Results suggest that longitudinal K-12 grading histories 
are useful. Grades and standardized assessments appear to be converging over time for 
one of the two school districts studied, suggesting that for one of the districts but not the 
other, current accountability policies and state curriculum frameworks may be pushing 
into classrooms and modifying teacher’s daily practice, as measured through an 
increasing correlation of grades and standardized assessments. Moreover, using cluster 
analysis, K-12 subject specific grading patterns appear to show that early elementary 
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school grade patterns prediet future student grade patterns as well as qualitative student 
outeomes, sueh as on-time graduation. The findings of this study also suggest that K-12 
subjeet speeifie grade patterning using eluster analysis is an advanee over past methods 
of predieting students at-risk of dropping out of sehool. Additionally, the evidenee 
supports a finding that grades may be an assessment of both aeademie knowledge and a 
student’s ability to negotiate the soeial proeesses of sehool. 
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CHAPTER I: INTRODUCTION 


The use of data to inform the decision making of school leaders and teachers in 
K-12 American schools continues to be a topic emphasized not only by organizational 
researchers who see data driven decision making as a means of instructional 
improvement (Bernhardt, 2004; Coburn & Talbert, 2006; Halverson et al, 2005; Kerr et 
al, 2006; Raudenbush, 2005; Streifer, 2004; Thorn, 2002; Wayman & Stringfield, 2006a; 
V. M. Young, 2006), but also according to law and policy as stricter mandates have been 
passed requiring data reporting and evidence based practice in schools (Earle & Fullan, 

2003) . Schools are inundated with data, including grades, attendance, discipline records, 
and standardized test scores (Creighton, 2001a). While much attention has been paid to 
using standardized test scores for data driven decision making (Bernhardt, 2004; Streifer, 

2004) , much of the industry of schools is devoted to grades, creating a dualistic system: 
one based on standardized testing and decision making that reports to policymakers and 
the government, the other based on grades that reports to students, parents and the 
community (Farr, 2000). Thus, the question for this study is, can grades be used for data 
driven decision making? 

Historically, grades have been criticized for being subjective and unreliable 
measures of student achievement (Cross & Frary, 1999; Kirschenbaum et al, 1971). 
While standardized assessments have undergone a “virtual revolution” over the past 
thirty years in reliability and validity of measuring student academic achievement (Cizek, 
2000), no such revolution has occurred in the arena of grades (Cizek et al, 1995-1996; 
Trumbull, 2000b). If grades are subjective and unreliable, how do they fit into a 
discussion of data-driven decision making for school improvement? One approach 


1 


Bowers, A.J. (2007) 



identified in the literature shifts emphasis away from the eritieism of the subjeetivity of 
grades to a diseussion of ways of making grades more valid through triangulation and 
eross-refereneing grading data with numerous other data sourees in sehools (Bernhardt, 
2004), and aligning grading with state eurrieulum standards (Farr, 2000). Thus, one of the 
hypotheses of this study is that while grades may have been subjeetive and unreliable 
assessments in the past, it may be that eurrently as sehools are pressured to align 
assessments with state mandated eriterion and eurrieulum, the two systems of grades and 
standard assessments are eonverging into one, inereasing the eorrelation between the two 
assessment systems. 

One theory proposed for past grade subjeetivity has foeused on the infiuenee of 
teaeher and student pereeptions on student grades. It is hypothesized that students who 
reeeive high grades in early elementary sehool eontinue to reeeive high grades throughout 
their sehooling eareer due to the positive motivation of high grades and teaeher and 
student pereeption of student ability based on past student performanee (Hargis, 1990), 
termed here the “Hargris hypothesis.” Moreover, it is hypothesized that students who are 
given low grades early on are loeked into a eyele of low grading. However, the question 
of how student’s grades pattern over time has not been empirieally tested to date. If past 
grading patterns are predietive of future student grade patterns, this would allow sehool 
leaders to prediet future student grade performanee outeomes (sueh as in high sehool) in 
elementary sehool in speeifie subjeets, and thus design instruetional interventions for 
individual students in speeifie subjeets before they beeome loeked into a eyele of low 
grading patterns with a higher probability of dropping out of sehool. 
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Hence, the research questions for this study are: 1) To what extent has the 
eorrelation between grades and standardized assessments ehanged over time? 2) To what 
extent does the hypothesis that past student grade patterns prediet future student grade 
patterns hold true? 3) To what extent is grade patterning useful in predieting student 
outeomes sueh as graduation or dropping out? To what extent do these predietive patterns 
aid in identifying avenues for early intervention by instructional leaders and teachers? 

This study outlines two domains for research within the broader issue of using 
grades as data for decision making by educational leaders. First, to explore the possibility 
that grades and standardized assessments may be converging, subject specific grades and 
standardized state assessment scores were correlated for the 1994 and 2006 graduating 
cohorts from two separate K-12 school districts. The evidence suggests that subject 
specific grades and standardized assessments may be converging for one of the two 
districts. This may be an indication that assessment policies may have affected one of 
these two school districts, but not the other. 

Second, a novel application of hierarchical cluster analysis is used to explore 
whether early student grade patterns are predictive of future student grade patterns and if 
overall student grade patterns are predictive of qualitative student outcomes, such as on- 
time graduation. The data suggests that generally, early student grade patterns are 
predictive of future student grade patterns. The application of cluster analysis to 
longitudinal subject-specific K-12 student grade data allows for the identification of 
specific timepoints in early elementary school for multiple clusters of students that may 
be important in the decision of where to apply the limited resources of a school district 
for data driven decision making by educational leaders. Additionally, cluster analysis of 
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subject-specific grades appears to be an advance over past methods of prediction of 
students at-risk of not graduating on time, not only using K-12 grade data, but 
interestingly also K-8, K-6, and even K-1. Furthermore, the evidence suggests that grades 
may be an assessment of both academic knowledge and a student’s ability to negotiate 
the social processes of school. 

Through the analysis of K-12 subject-specific grades and standardized 
assessments, the contention of this study is that teacher assigned subject specific grades 
are important and useful as data for data driven decision making by educational leaders. 
Furthermore, as data that schools already collect on students, grades may predict future 
student outcomes, providing grade-level and subject-specific intervention points for 
school and district-level data driven decision making. 
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CHAPTER II: LITERATURE REVIEW 


Using Data for Instructional Improvement 

Over the forty years sinee the publieation of the Coleman report (Coleman et al, 
1966) and as the demands of the aceountability movement have gradually increased, 
schools and school districts have increasingly come under pressure to improve 
performance through the use of data analysis (Eullan, 2000) to the point where data 
analysis in schools has become unavoidable (Earle & Eullan, 2003). Currently, much of 
the literature urges school leaders and decision makers to use data-driven decision 
making to help inform their practice and help them make sound decisions based on what 
the data in their schools tells them (Bernhardt, 2004; Halverson et al, 2005; Streifer, 
2004; Wayman, 2005; Wayman & Stringfield, 2006a). 

It has been argued by Elmore that the relatively recent increase in accountability 
and performance pressures on the educational system from external agencies is due to a 
switch from what he terms the “attainment culture” to the “performance culture” 

(Elmore, 2002, 2003). In the attainment culture, schools were judged by how well 
children who were deemed worthy of an education were moved through the system. 
However, beginning with standardized tests in the 1960s, and continuing to this day, vast 
inequities were realized within the system, revealing the large differences in test scores 
and knowledge acquisition not only between children of high SES families and low SES 
families but also between different ethnic groups. With this realization by businesses, 
government, and policy makers, along with the general rise of performance-based 
methods and organizations in the general society, schools have been pressured to change 
to a performance culture. What this means is that political leaders link funding to 
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progress by the edueational system in an attempt to hold the edueational system 
aeeountable for the learning of the historieally lower performing SES and ethnie groups 
revealed by standardized testing. Moreover, this performanee eulture has risen eoneurrent 
with, and is likely linked with, sehool and distriet attempts to ineorporate the tenets of 
quality management, ineluding eontinuous improvement, eustomer foeus, systems 
thinking, proeess evaluation and data-driven deeision making (Detert et ah, 2000). This 
performanee and quality management eulture has shifted the proeesses of sehools from 
the edueation of a selection of students with low accountability to the publie to the 
edueation of all students with high accountability to the publie, ereating a need for 
sehools and distriets to examine their data elosely to determine what deeisions to make 
about what works in sehools (Fullan, 2001; Raudenbush, 2005). 

The Challenges of Using Data in the School Context to Make Decisions 

With the advent of the performanee eulture and quality management, sehools have 
begun to foeus on eolleeting, diseussing and using data to inform deeision making 
proeesses (Bernhardt, 2004; Coburn & Talbert, 2006; Halverson et ah, 2005; Kerr et al., 
2006; Streifer, 2004; Thom, 2002; Wayman & Stringfield, 2006a; V. M. Young, 2006). 
While in the past, deeisions by sehool leaders oftentimes were based on intuition, fads, 
mles of thumb, or past experienee (Bernhardt, 2004; Creighton, 2001a; Earle & Eullan, 
2003), exemplary sehools and sehool distriets have been shown to use data effeetively to 
improve instmetion (Edmonds, 1979; Elmore & Burney, 1999; Hightower & 

MeEaughlin, 2005; Kerr et al., 2006; Massell & Goertz, 2002; Sehmoker, 1999). Eor 
these effeetive sehools, data use, through monitoring of student aeademie progress and 
intervention for individual students, was one of five faetors that also ineluded a foeus on 
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basic skills, high expectations for all students, strong instruetional leadership and an 
orderly environment (Murphy & Hallinger, 2001; Teddlie, 1994). While multiple 
examples of high performing low ineome sehools exist, “what works” (Raudenbush, 
2005) remains a question for leaders in sehools and distriets. 

For many edueational leaders working from a quality management perspeetive, 
the question of what works motivates them to examine the effects of the organization on 
the students, and determine the eauses of those effeets (Supovitz, 2002). It has been well 
argued that the gold standard for determining causality is randomized eontrolled trials 
(Raudenbush, 2005). Only through random assignment of treatment and eontrols is a 
researcher able to say with certainty if a specific intervention caused an outcome. 
However, for most sehools, it is prohibitively expensive to eonduet sueh trials 
(Raudenbush, 2005). While some authors have argued that school districts randomly 
assigning scarce resourees and then traeking the outeomes over long periods of time is 
not only possible, but has sueeeeded in the past (sueh as in the Perry presehool study) 
(Rothstein, 2004), for the vast majority of schools, random controlled experiments are 
beyond the seope of their expertise and funding (Streifer, 2004). Thus, sehool leaders rely 
on specific statistical techniques to aid them in using data effeetively. 

Most often, the next best teehnique is multiple regression statistieal analysis 
(Streifer, 2002). Using multiple regression, an evaluator is able to take the vast variety of 
data generated by students and use it to prediet future student outeomes on speeifie 
variables, sueh as state test scores (Streifer, 2004). However, using this teehnique in 
sehools violates many of the assumptions of multiple regression, ineluding large enough 
sample sizes, the independence of cases, the independence of variables (multieolinearity). 
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normality of the data, the independenee of varianee explained from the varianee 
remaining, and the homogeneity of varianee across cases (Cohen et al, 2003; Howell, 
2002; Rencher, 2002). Though additional statistical methods such as data mining 
algorithms (Streifer, 2004, 2005) or hierarchical linear regression techniques (such as 
HLM) (Raudenbush & Bryk, 2002), address some of these issues with multiple 
regression (namely multicolinearity and the dependence of cases), the other issues 
remain, leaving multiple regression as a poor statistical procedure for use by school 
leaders. Furthermore, inferential statistical techniques such as multiple regression are 
designed to estimate the mean for the population from which a sample is taken. If one 
already possesses all of the data for a selected population (such as a school district), there 
is no need to estimate the population means since one can calculate them directly. Many 
school leaders who are looking to determine what works in their schools do not wish to 
generalize their population of students to the greater population averages, which is the 
purpose of multiple regression. Rather, they wish to know what is working and is not 
working for their specific students for the very near future, measured in near-term time- 
frames (Creighton, 2001a). 

Leveraging data to make decisions at the school level is a complicated process 
(Wayman, 2005). It has been argued that educational leaders should forgo the more 
difficult and complex issues around higher level statistics and concentrate rather on 
collecting data from multiple sources, including test scores, grades, demographic data, 
school processes, community and organization perceptions. They should use descriptive 
statistics to better understand what the data says for their specific situation, making more 
informed decisions based on those descriptive reports that create an overall picture of 
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what is occurring in schools (Bernhardt, 2004; Halverson et al, 2005; Kerr et ah, 2006). 
While the literature points to the types of data to be used, and different ways to analyze 
that data, it is school leaders who must use that data in deeision making proeesses. 

While some sehools have led sueeessful improvement efforts, instruetional 
improvement aeross the system is aeknowledged as spotty and in need of mueh more 
improvement, espeeially for ehildren of urban, low SES and ethnie minority families 
(Elmore, 2002). It has been argued that sinee the 1970’s we have had all of the data 
needed to improve sehooling for not only these subgroups of ehildren, but all ehildren 
(Edmonds, 1979; Marshall, 1997). Sehools, however, are awash in data, generating 
standardized test seores, aehievement seores, grades, attendanee, diseipline reports and 
portfolios on eaeh student. Sueh data ean result in a disorganized and ineoherent database 
(Brunner et al, 2005; Cizek, 2000; Earle & Katz, 2003; Salpeter, 2004; Streifer, 2004) 
presented in dense and inaeeessible reports to sehool leaders and teaehers (Wayman et 
al, 2004) who on average have a rudimentary training in statisties (Creighton, 2001b; 
Earle & Fullan, 2003; Seeada, 2001). For edueational leaders in the eurrent era however, 
linking instruetional improvement to a eritieal analysis of data is now unavoidable (Earle 
& Fullan, 2003). While many sehool leaders eurrently foeus on standardized test seores, 
data is being eolleeted daily on every student in multiple ways (Bernhardt, 2004; 
Creighton, 2001b). Mueh of this data eolleetion of sehools is eentered on grades. 

Grades, Grading and Marks 

Historieally, the vast industry of data eolleetion in sehools and sehool distriets has 
eentered on grades. Sinee its ineeption, the American public school system has had a 
foeus on grades and grading (Quann, 1983) with the purpose of providing feedbaek to 
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administrators and potential employers for predieting student’s future performanee from 
eurrent grades, guiding students to areas of aptitude, providing student performanee 
information to parents and administrators, and motivating students to do well and be well 
diseiplined (Evans, 1976; Trumbull, 2000b). For students, working to aehieve a high 
grade, eompete against fellow students, or game the system takes up a large pereentage of 
their time. For teaehers, designing and proetoring assessments, grading the assessments, 
and negotiating with students over their grades requires substantial amounts of time both 
inside and outside of the sehool day (Hargis, 1990; Kirsehenbaum et al, 1971). These 
unending demands on time eoneeming grades and grading for both teaehers and students 
are in addition to the relatively reeent introduetion of standardized testing. With the 
advent of state and federally mandated testing, sehools and sehool distriets are 
inereasingly devoting more and more time to preparing for and administering these state 
standardized tests (Militello, 2004; Salpeter, 2004). The work surrounding grades and 
grading however, eontinues unabated. As a result, in Ameriean K-12 edueation we have a 
dualistie assessment system (Farr, 2000), one based on psyohometrieally standardized 
tests, and one based on the subjeetive industry of aequiring and awarding grades. 

In the past, the praetiee of standardized testing was eritieized for how they were 
used and for the validity of the tests (Goslin, 1968). More reeently, standardized testing 
has undergone a “virtual revolution” with an inerease in test validity and reliability 
(Cizek, 2000). Flnfortunately, no sueh revolution has oeeurred in the arena of grades and 
grading (Cizek, 2000; Cizek et al, 1995-1996; Trumbull, 2000b). For those who have 
examined grades and grading practiees, grades and marks have been reported to be highly 
variable and subjeetive, failing to adequately perform the stated purpose of providing 
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feedback, prediction, guidance, information and motivation for students and their parents 
(Hargis, 1990; Kirschenbaum et al., 1971; S. Simon, 1976). 

A study of four elementary schools in California demonstrated the subjectivity 
and the role of teacher perception on student achievement. All students were given a test 
that teachers had been told would predict the IQ gains of the child over the next year. 
About ten students were selected at random from every classroom and assigned a high 
score. The teachers were then told only about the scores, not that the children were 
randomly assigned. These children in each class were then used as the experimental 
group and the remaining children as controls. At the end of the year, children in the 
experimental group from kindergarten, first and second grades made significant gains on 
IQ tests and achievement measures over the controls, and teachers rated the children as 
more cooperative, more socially adjusted and more well behaved (Rosenthal & Jacobsen, 
1969). While dubious ethically, and followed by multiple publications questioning the 
veracity of the claims of the study (Elashoff & Snow, 1971), the basic assertion that in 
early grades teacher perception may influence student achievement has been supported 
(Raudenbush, 1984; Spitz, 1999). Thus, student outcomes may be dependent on teacher 
perceptions. 

Other studies have examined the practice of grading and how teachers construct 
grades by incorporating many different factors. These practices have been termed 
“hodgepodge” grading practices (Brookhart, 1991; Cross & Frary, 1999) with little 
reliability; it is basically random within schools, differentially incorporating academic 
achievement as well as effort, improvement and behavior into assigned grades (Cross & 
Frary, 1999; Frary et al, 1993). Identified by Talcott Parsons over 45 years ago, grades 
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have been reeognized indieators of aeademie, interpersonal and soeial faetors (Parsons, 
1959). Reeent studies have shown that teaehers independently ineorporate into grades 
sueh personalized measures as elassroom partieipation, attendanee, behavior and eonduet, 
eompletion of homework, aehievement on homework, student ability, student growth and 
improvement, effort, and aehievement on elassroom assessments (Cross & Frary, 1999). 
One could argue, then, that what a grade represents is different for different teachers and 
different students within the same school building. With such a system of grades, what a 
single letter grade represents is unknown. 

In addition to their dependence on perception and to their hodgepodge grading 
nature, grades have long been criticized as essentially subjective. The seminal studies of 
Starch and Elliot demonstrated the subjectivity in teacher graded English, geometry and 
history exams (Starch & Elliot, 1912, 1913a, 1913b). In their first study, the researchers 
took two English exam questions and answers, and sent the sets to 200 schools requesting 
that the head of the English department grade the answers on a 100 point scale. Of the 
approximately 150 exams returned, the range in scoring was about 39 points for both, 
meaning that while some teachers gave a high “A”, others gave a “D” to the same 
answers. Once this study was published, an outcry arose that English was a subjective 
subject, so a large range on an English exam could be expected (S. B. Simon & Bellanca, 
1976). Subsequently, Starch and Elliot attempted the same study with geometry. Of 138 
geometry exams graded and returned, the range in scores on a one hundred point scale 
was 45 points, even greater than the English exam (Starch & Elliot, 1913a). A replication 
for a history exam also gave a range of 40 points (Starch & Elliot, 1913b). Evaluation of 
the comments given by the graders showed that teachers scored the exams very 
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differently, marking for a eombination of penmanship, neatness, spelling, showing all 
work, and the right answer, no matter the subjeet area. The scores returned represented a 
normal curve; the same result one would expect sampling a random population. These 
results indicated that exam scoring is subjective, and variation among teachers is random. 
Although much has been done to increase the objectivity, reliability and validity of 
standardized tests, little to no progress in the area of grading objectivity and reliability 
has been observed in the 100 years since these problems were first described (Cizek, 

2000; Kirschenbaum et ah, 1971). 

The reasons behind this subjectivity have been sparsely addressed in the literature. 
For those who have studied it, this persistent problem of grades has been attributed to 
teacher isolation (Elmore, 2002), little to no training or preparation in testing and 
measurement for teachers (Carr, 2000), and a lack of dialogue and communication 
between teachers and district administration about grading standards and alignment 
(Cross & Frary, 1999; Kirschenbaum et ah, 1971). Furthermore, bias in assessment and 
scoring is thought to be widespread (Trumbull, 2000a). Combined with the subjectivity of 
grades discussed above, these issues add to the list of problems with grades. 

Three current theories provide insight into issues related to grade subjectivity. 
First, it has been noted that the practice of grouping children in grades based on their age, 
with an arbitrary cutoff date for yearly enrollment, generates classrooms that are assumed 
to have low variance of pre-knowledge among the children. However, in fact, the 
children have been shown to differ by as much as three grade equivalent years of 
knowledge upon entrance to first grade, a gap that continues to increase in subsequent 
years (Hargis, 1990). To cope with the vast variance contained within a classroom. 
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teachers provide one level of instruction, directed to the median knowledge-base and 
ability of the class (Hargis, 1990). Second, as a counter to the known subjectivity of 
teacher assigned grades and to increase the statistical calculation of test reliability, 
psychometricians have instructed teachers to increase classroom variance by placing 
extremely difficult questions on tests. This practice increases the ranking capability of the 
tests (Carr, 2000; Cross & Frary, 1999), and also ensures that some portion of students 
will have difficulty or may fail. Third, while empirical data is sparse, it has been 
hypothesized that children who receive high grades continue to receive high grades 
throughout the schooling process, and children who receive low grades continue to 
receive low grades due to the positive motivation of high grades, the absence of 
motivation of low grades, teacher perceptions of student ability based on past grades, and 
the ability tracking assigned to grades by the organization (Evans, 1976; Hargis, 1990; 
Kirschenbaum et ah, 1971). Combined, these three theories indicate that the traditional 
grading system ensures that a certain percentage of students will fail, as children are 
graded, ranked and tracked through the system (Hargis, 1990). This is especially 
troubling given the above discussion of the subjectivity and “hodge-podge” nature of 
teacher assigned grades. 


Using Grades for Data-Driven Decision Making 

The question that underscores this study is: If grades are subjective, invalid and 
unreliable, how do they fit into a discussion of data-driven decision making for school 
improvement? While it has been argued that due to these problems with grading, grades 
could be eliminated from schools and students could be judged only on standard 
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assessments (Kohn, 1994), it is understood that grades are an integral part of the funetion, 
strueture, and eommunity pereeption of schools and are thus, here to stay (Hargis, 1990; 
Kirschenbaum et ah, 1971; Trumbull, 2000b). Thus, the emphasis has shifted to a 
discussion of ways of making grading more “instructionally valid” (Newmann, 1991), 
triangulating and cross-referencing grading data with the numerous other data sources in 
schools (Bernhardt, 2004), and aligning grading with state curriculum standards and 
standardized tests (Carr & Farr, 2000; Farr, 2000; Waters, 2000). For teachers, however, 
grades have “face validity”; teachers are often more willing to accept grades over other 
assessments such as standardized tests because they assigned those grades based on their 
own assessments {Ncrel guide to using data, 2004; Mehrens & Lehmann, 1991). 

For schools and school districts, analyzing grading data for decision making is 
vitally important. Despite all of the known issues with grades and grading subjectivity 
addressed above, grades are used to make decisions that have direct impact on both 
students and schools. Grades are used to make decisions for special needs testing, to 
assign special education services, and to admit or channel students into specific 
curriculum tracks (Hargis, 1990; Langdon & Trumbull, 2000). For schools, these 
decisions impact not only finances, especially with special education decisions, but also 
the long-term success of students, including dropout, graduation and college admittance 
levels. As a result, it is crucially important that schools and districts examine grade data 
when making decisions that will impact the long-term success of students. The question 
of exactly how to examine that data in ways to help schools address these issues remains. 
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CHAPTER III: THEORETICAL ERAMEWORK 


While many issues eoneerning grades and grading need to be addressed, this 
study focuses on two specific issues related to grades and their potential use in data- 
driven decision making for instructional improvement in schools. The first issue relates to 
the hypothesis of grade patterning and the second issue concerns classroom grade 
variance. 

Student Grade Patterning: Identification, Prediction, and Intervention 

As referred to above, the supposition has previously been made that due to the 
subjectivity of grades and the influence of teacher perceptions on grades, students who 
obtain high grades early on in schooling continue to get high grades throughout their 
school career and students assigned low grades may become trapped in a cycle of low 
expectations and grades (Hargis, 1990), termed the “Hargris hypothesis” for this study. It 
has also been postulated that student motivation, one of the primary goals of grades, only 
influences students who get high grades (Evans, 1976; Kirschenbaum et ah, 1971). The 
literature on the effects of teacher perception and expectancy on student gains supports 
this theory of early success, in that if positive teacher perception of a student’s ability 
does influence student gains, then that perception has the most influence in the early 
grades at the earliest times in the school year (Spitz, 1999). This idea of general early 
student grade patterns predicting future student grade patterns is shown for a hypothetical 
dataset of 8 students in Eigure 1 . This idea of student patterning, the Hargris hypothesis, 
has been detailed in the literature. Essentially, students who receive high grades in early 
elementary school are the students who continue to receive high grades throughout their 
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time in a school district {Figure 1, Students 3, 5 & 7), and students who receive low 
grades early on, may be locked into a cycle of low grading {Figure 1, Students 1, 2 & 6). 
These overall grade patterns have not been empirically demonstrated in the literature to 
occur over multiple years of schooling. 
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Figure 1: Theoretical grading trends for one 13 year cohort in mathematics. 

Although not empirically tested to date, the theory drawn from the literature of the 
Hargris hypothesis indicates that early on, students 3, 5 and 7 receive high grades and are 
thus motivated to continue to receive high grades throughout. Students 1, 2 and 6 initially 
receive low grades and are thus trapped in a cluster of low grading throughout their 
schooling career. It is unknown if students 4 and 8 exist on a large scale, whether they 
start high or low and finish in an opposite position, as well as how these different clusters 
of student patterns are similar and different from each other. More specifically, could 
student 4 have been recognized in 4'*' or 5*^ grade and had an instructional intervention 
designed to move the student back into the high scoring cluster? This figure is presented 
in color. 
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If grade patterning in this fashion is oeeurring, and is more a funetion of 
subjeetive faetors rather than aetual student aehievement (potential or realized), a better 
understanding of this type of previously unexplored grouping behavior eould assist a 
sehool distriet in making more informed deeisions about whieh students are eonsidered 
underperforming, traeked into speeial needs assessment, or given aeeess to gifted 
programs. Furthermore, a better understanding of how student grades pattern with other 
students within a elassroom, eohort, sehool and distriet, eombined with qualitative data 
sueh as distriet transfer status, gender, retention reeords, and test taking patterns, eould 
help sehool leaders pinpoint previously unknown empirieally derived subgroups of 
ehildren who are in need of targeted interventions {Figure 1, Student 4). Sueh data eould 
help inform teaehers and administrators of whieh groups of ehildren are sueeeeding or 
failing within the grading system, and what those ehildren’ s similarities are, in an effort 
to analyze what works and does not work in a distriet. Sueh information would enable a 
sehool distriet to help more ehildren be sueeessful. 

A potential statistieal tool that eould be used to study this type of group patterning 
is eluster analysis (Lorr, 1983; Reneher, 2002; Romesburg, 1984). In eluster analysis, 
group patterns ean be empirieally derived from both grading and standardized 
aehievement test data. Group pattern trends ean be used to prediet future outeomes, sueh 
as using elementary sehool grades to examine whether or not the Hargris hypothesis is 
aeeurate. Another possible use is examining past grade patterns to prediet qualitative 
student outeomes, sueh as on-time graduation. Since cluster analysis is rarely used in 
educational research, it shall be explicated at length in the methods. 
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The Convergence of Grades and Standard Assessments 

The second issue addressed in this study is the issue of teacher induced classroom 
grade variance and the correlation of grades and standard assessments over time. As 
discussed above, the supposition has been made that teachers are confronted by student 
populations that have high variability in pre-knowledge and ability, and throughout the 
course of schooling the variability between students increases. This is attributed in part to 
teachers using one level of instruction (directed at the middle) and designing assessments 
that increase variability, each combined with the subjectivity and grouping patterns of 
grades (Hargris, 1990) discussed above. Even in the case of newer assessment strategies, 
such as portfolios or formative assessments used in combination with traditional 
assessments (Airasian, 1994), these issues of teacher subjectivity, perceptions and grade 
variance, remain. However, with the rise of standard assessments and accountability, it is 
possible that a currently unexamined change is underway in instruction and teacher 
grading variance. 

As schools and school districts adapt to the introduction of state mandated 
standardized assessments, they are beginning to realign their curriculum to the state 
standards under community pressure to perform well as an organization on these 
assessments. By aligning grade report cards to standardized assessment reports, schools 
decrease the difference between the two reporting systems (Bisesi et al, 2000; Carr & 
Farr, 2000). Due to the criterion referenced nature of state standardized tests, teachers 
must adjust instruction to cover the curricular objectives that the test assesses (Falk, 

2002; Popham, 2004). Through alignment of curriculum to the tests, grades and 
standardized assessments may be converging into one system (Farr, 2000; Finn, 1982, 
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2000). One hypothesis for this study is that teaeher assessment writing, instruetion and 
grading practices for core subjects at all grade levels are changing such that the variance 
between student grades is decreased, and the usefulness of grades in predicting 
standardized measures of academic achievement is increased. As teachers personalize 
and differentiate instruction for students who are perceived to be below the state 
curriculum criterion and grade students with teacher designed assessments that are 
aligned with the state standardized assessments, grades may be becoming more aligned 
with the state standard assessments. If true, this would decrease classroom variance as the 
low performing students are brought up to the criterion. As a result, grades would be 
better indicators of student academic knowledge. Of course this hypothesis takes as an 
assumption that standardized test scores are a valid assessment of student knowledge and 
academic achievement. A proposal of this study is that grading practices in the current 
era of accountability, as opposed to grading in the past, are becoming more aligned with 
standardized assessments as the two systems converge. If this is so, educational leaders 
could use either grades or standardized assessments to predict each other for the purpose 
of making decisions at the district, school and student levels. 

More specifically, this study investigates the correlation and distribution of grades 
and standard assessment scores for students within schools and districts at two time 
points. While not empirically tested, the literature on grades intimates that there may be 
little correlation between grades and standardized assessments. However, this has not 
been examined closely in the literature, due in part to the known high subjectivity of 
grades and the difficulty of obtaining large datasets of student subject-specific and grade- 
level grading histories. 
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Figure 2: Hypothesized change in grade distributions from before and after the 
implementation of criterion referenced tests. 

Students in the past may have been awarded grades along a normal distribution (solid 
eurve). After the implementation of eriterion refereneed standardized tests, paeing guides 
and eurrieulum alignment, students in elassrooms that have obtained 100% passing of the 
eriterion (dashed line) are now hypothesized to have a grade distribution that is skewed 
somewhat higher (dashed eurve). 


While also not well studied empirieally, the idea that teaehers assign grades that 
distribute students within a normal eurve either purposefully or unintentionally, has been 
hypothesized (Carr, 2000; Cross & Frary, 1999; Hargis, 1990; Kirsehenbaum et ah, 1971) 
{Figure 2). If teaeher grading varianee has ehanged sinee the introduetion of eriterion 
refereneed exams and if grades and standardized assessments are beeoming more aligned 
{Figure 2), it is a hypothesis of this study that the distribution of grades is beginning to 
have an inerease in positive skew as teaehers eoneentrate instruetion on students below 
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the criterion level, necessarily raising their achievement in all aspects in that classroom, 
and thus the student’s grades {Figure 2, compare the solid and dashed curve). 

Concurrently, with the implementation of criterion referenced exams and the 
pressure to align curriculum and classroom practice to state guidelines, grades and 
standard assessments may be becoming more strongly correlated, in which case, grades 
and standardized assessments are becoming more predictive of each other. One intent of 
this study then, is to compare the extent to which grades and standard assessments were 
correlated at a past date with correlations using current data {Figure 3). As shown in 
Figure 3, currently little is known about the correlation between grades and standard 
assessments and if that correlation has changed over time. By examining student subject- 
level grades and state standard assessment scores over time, it can be determined if a 
change in the correlation of the two assessment systems has taken place over time. 
However, the cause of the change would still be unknown {Figure 3). 

If grades have become more correlated with standardized assessments, this would 
have many implications for schools and districts engaged in data driven decision making. 
One topical implication would be that districts would have less of a requirement for 
additional district designed and proctored pre-standardized tests (periodic tests) designed 
to predict how well a student population will perform on upcoming state mandated tests. 
Grades alone may sufficiently predict standardized test scores, decreasing the amount of 
time devoted to pre-standardize assessment preparation, proctoring and evaluation. 
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Previously: 
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Grades 


Grades 
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Unknown causality 

Unknovii 


for any change 



Grading distribution, 
variance and correlation 
to standard assessments 
in the past unknown 


Grading distribution, 
variance and correlation 
to standard assessments 
in the present unknown 


At study 
completion: 


The Past: 
Grades 
Known 


Hypothesized causality 
for change 

(standard assessments) 


♦> 


Currently. 

Grades 

Known 


Grading distribution, 
variance and correlation 
to standard assessments 
in the past known for 
two districts 


Grading distribution, 
variance and correlation 
to standard assessments 
in the present known for 
two districts 


Figure 3: Hypothesized scope of the study of the change in grading variance and 
alignment. 

Previous to the present study, grade distribution, variance and grade corTelation to 
standard assessments for core sirbjects were imknowu both in the past and ciuieutly. At 
completion of the proposed stirdy, grade distribrrtion, variance arrd correlation to standard 
assessments will be known for two districts. While cairsality will not be explored in the 
stirdy, the hypothesis that standard assessments have led to a change in grade variance is 
suggested. 


Framework Conclusion 

Again, the question driving this proposal is; Can grades be used in data driven 
decision making? The proposed stirdy will address this question in two parts {see Figure 
4). First, I evaluate conelations of teacher assigned gr ades and standardized assessments 
and explore whether or not the conelations have strengthened over time. I argue that a 
strong positive conelation could increase the potential that district leaders and teachers 
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could use grades as valid data measures of achievement in decision making {Figure 4, left 
column). Second, I examine student grade patterns to cluster students and understand how 
past student grades predict future student outcomes. Specifically, I hope to pinpoint 
specific times and subjects for early instructional intervention for specific students 
{Figure 4, right column). 


Basic Question: 


Can Grades be Used for Data Driven Decision Making? 


Research Proposal: 


Correlation of Grades 
and Standardized 
Assessments 


Using Past Student 
Grade Cluster 
Patterns to Predict 
Current Student 
Intervention Needs 
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Past Patterns 
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Results: 


Correlations 


Correlations 

in the Past 


Currently 


- 1 - 


Implications: 

If more correlated 

Grades could be better used for 
test and achievement prediction 


Indicates precise intervention 
points for specific students in 
specific subjects and grades 


Figure 4: General Flow of the Dissertation Framework. 

The proposal of this dissertation is to study if grades can be better used for data driven 
decision making in K-12 schools. The general flow of the proposed questions and data is 
presented, with hypothesized results and general implications. 
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Research Questions 


1 . To what extent has the eorrelation between grades and standardized assessments 
ehanged from earlier student eohorts to more reeent eohorts? 

2. To what extent does the Hargris hypothesis of past grading patterns predieting 
future student grade patterns hold true? 

3. To what extent is grade patterning useful in predieting student outeomes sueh as 
graduation or dropping out? To what extent do these predietive patterns aid in 
identifying avenues for early intervention by instruetional leaders and teaehers? 
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CHAPTER IV: METHOD 


Sample 

For this study, the entire assessment histories of a sample of students were 
eolleeted, ineluding grades and standardized assessments. The sample of students was 
eomprised of all of the students of the entire graduating eohorts of 2006 and 1994 
(whether or not they graduated) for two distriets, West Oak and South Pine 
(pseudonyms). Distriets were seleeted based on their eomparative small sizes (less than 
3000 students eaeh) to keep the study at a reasonable size for a single researeher to 
eomplete data eolleetion over a three month time period, their relative diversity in student 
populations, and their willingness to partieipate in the study. Both distriets are loeated in 
the Ameriean Mid- West, are loeated within 20 miles of eaeh other, and are first ring 
suburbs of a large metropolitan area. In addition, both distriets are eurrently undergoing 
dramatie demographie ehanges as their populations shift from a majority European 
Ameriean demographie, to an inereasing population of Hispanie and Afriean Ameriean 
families. For issues of eonfidentiality, distriet speeilies are intentionally left vague. 

West Oak is defined as a mid-sized eentral eity by the U.S. eensus, with less than 
3000 students attending two elementary sehools, a middle sehool and a high sehool. In 
2006, the distriet served a student population that was about 70% eeonomieally 
disadvantaged, 50% Hispanie, 30% European Ameriean and 15% Afriean Ameriean. The 
distriet has historieally lagged behind the state averages on state standardized tests in 
both reading and mathematies at all grade levels (NCES, 2006; S&P, 2006). 

South Pine is defined as an urban fringe of a mid-sized eity by the U.S. eensus, 
with fewer than 3000 students attending three elementary sehools, a middle sehool and a 
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high school. In 2006, the district served a student population that was about 50% 
economically disadvantaged, 50% European American, 20% Hispanic, and 15% African 
American. The district has historically scored near the state averages on state 
standardized tests in both reading and mathematies at all grade levels (NCES, 2006; S&P, 
2006). 

Data Collection 

Students were included in the sample if they started first grade with the student 
eohort expeeted to graduate from high school in either May of 1994, or 2006. For both 
distriets, the first grade school year was 1982/1983 for the graduating class of 1994, and 
school year 1994/1995 for the graduating elass of 2006. Two eohorts were seleeted in 
eaeh of the two distriets to provide an initial eomparison of grading and standardized 
assessments over time. The 2006 eohort was seleeted as the most reeently graduated 
eohort from eaeh distriet. The 1994 eohort was seleeted beeause it was the oldest eohort 
in West Oak for whieh student data files eontained both grading histories and state 
standardized test seore reeords. For eomparison, the graduating eohort of 1994 was also 
ineluded for South Pine. Thus, four eohorts of students eomprise the sample. 

Eaeh student’s permanent reeord in paper form was aeeessed from the distriet’s 
long-term paper file storage. Student data was entered into SPSS, using a unique 
identifier to de-identify eaeh student. No student names were reeorded for this study. For 
eaeh student, grades for every subjeet for every year were reeorded, K through 12. 
Additionally, scores for eaeh standardized test on reeord were reeorded, ineluding 
eomposite and subjeet speeifie seores. Standardized tests ineluded the state standardized 
tests for grades 3, 4, 5, 6, 7, 8, and 10 as well as the ACT. Beeause it was outside of the 
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scope of this study, attendance was not recorded. For students who transferred into the 
distriet on traek to graduate in either 1994 or 2006, if the student’s file eontained grades 
and assessment data from their past sehool distriet, those grades were also reeorded. 

For high sehool grades, both the letter grade and the name of eaeh elass taken 
were reeorded for eaeh semester for grade 9 through 12 to provide a rieh dataset in whieh 
both the subjeet grades were reeorded as well as the name of eaeh subjeet-level elass for 
eaeh semester for eaeh grade level. Classes were grouped by the following subjeets; 
Mathematies, English, Seienee, Foreign Language, Soeial Seienee, Eeonomies, Band, 
Physieal Edueation/Health, Computers, Life Skills, Family Skills, and Art. Aeeordingly, 
multiple elass grades of the same subjeet but for different elasses were reeorded within 
the same subjeet and grade level variable, so that for eaeh student one colu mn of data was 
recorded as the different elass names for a subjeet during a speeifie grade level and 
semester, and the next eolumn was the letter grade for that subjeet in that grade level and 
semester. As an example, the data reeorded for first semester 10* grade mathematies 
elass name for all eohorts ineluded elasses sueh as Algebra, Geometry, Trigonometry, 
and Math Skills, among others. The letter grade for eaeh student for eaeh of these 
different elasses was reeorded under the variable name “Math Grade 10 Semester 1”. 

This was repeated for all high sehool elasses. 

Beeause the two distriets over the two time periods reeorded Middle Sehool and 
elementary grades differently by semester, some reeording just the final yearly grade and 
some reeording by semester, and also beeause some of the schools had different semester 
sehedules in whieh the 180 day sehool ealendar was divided into 2, 3 or 4 semesters, all 
Middle Sehool and elementary grades were reeorded as “eomposite grades”. To generate 


28 


Bowers, A.J. (2007) 



the composite grade, letter grades for each subject for each semester recorded were first 
converted to the following numeric grading scale: A=4.0, A- = 3.666, B+ = 3.333, B = 
3.0, B- = 2.666, C+ = 2.333, C = 2.0, C- = 1.666, D+ = 1.333, D = 1.0, D- = 0.666, E or F 
= 0. Then, the mean grade for that school year was calculated from the numeric grades to 
generate the composite grade. Composite grades were then entered into SPSS similarly to 
the high school grades by subject. 

Although course names at the elementary level were fairly consistent across 
districts, time periods and report cards, early elementary grading marks were not. This 
posed an interesting dilemma as to how to record subject specific grades for each student 
at each grade level for grades K (kindergarten) through 3. Table 1 presents the different 
grade marking scales identified from the various report cards for grades K through 3. 
Interestingly, while few report cards for these grades used the more standard A,B,C,D 
grading scale, all conformed to some form of a four point scale. No matter the scale used, 
from pluses and checks, to V, S, N, O, to 1,2, 3,4, teachers awarded students based on a 
four point scale that mirrored the classic A,B,C,D scale. Interestingly, except for the one 
report card in the sample that used the symbol grading scale, teachers commonly used the 
+/- designations to represent a degree of achievement between scoring ra nks . As 
examples, with the VSNO scale, V or was a common designation, or with the 1,2,3 
scale a 1' or a 2^, indicating a mark between the top mark and next highest mark {Table 
1). With the grading scales mirroring the traditional four-point scale, each grading 
period’s mark by subject was converted to a numeric grade according to the scheme 
presented in Table 1. The mean for all of the grading periods for each subject in a specific 
grade level was then recorded. 
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Table 1: A Four-Point Grading Scale and the Differential Grading Marks of Elementary 
Teachers, Grades K-3 


Standard 

A 

A' 

B+ 

B 

B' 

C" 

C 

C 

D 

Cheek 

+ 

+■ 


V 

f 

+ 

— 

— ' 

0 

VSN 

V 

V 

s" 

s 

S' 

N" 

N 

N' 

0 

OSN 

0 

O' 

s+ 

s 

S' 

N" 

N 

N' 

SE 

123 

1 

r 

2^ 

2 

2' 

3^ 

3 

3' 

4 

ABCN 

A 

A' 

B^ 

B 

B' 

C" 

C 

C 

N 

ABPH 

A 

A' 

B^ 

B 

B' 


P 

P' 

H 

Symbol 

A 



0 



X 


r 

Numerie 

4 

3.6 

3.3 

3 

2.6 

2.3 

2 

1.6 

1 

Conversion 











Symbol Key: 

V - Very good, S - Satisfactory, N - Needs Improyement 

1 - Excellent progress, 2 - Progressing at expected leyel, 3 - Needs to improye, 4 - Special needs 
^ - Demonstrates effectiyely, O - Demonstrates some, X - Working, p - Does not demonstrate 
P - Progressing, H - Help needed, SE - See comments 
Note: “O” was used differently for multiple scales 

Additional variables were also reeorded for eaeh student, ineluding gender, date 
of birth, ethnieity, and student transfer status, both in and out of the distriet. The issue of 
the designation of “dropout” is highly eontested in the literature (Greene & Winters, 

2005; NCES, 2004; Swanson, 2004; Viadero, 2006) and offieial definitions differ by state 
and by region. Nevertheless, many students who were on traek to graduate on-time with 
their eohort in this sample did not. Beeause the term “dropout” is eurrently under 
eontention in the literature and poliey domains, for this study, as has been previously 
reeommended (Ensminger & Slusareiek, 1992; Marrow, 1986), students were designated 
as either On Time Graduation - students who had evidenee of reeeiving a diploma on- 
time with their eohort or had evidenee of a valid transfer out of the distriet - or Not On 
Time Graduation (NOTG). 

A student was eonsidered to have graduated on-time if their reeord eontained 
evidenee of the award of a diploma. A valid student transfer was defined as any student’s 
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record which contained a request for student transcripts from another school district or 
school which was not an alternative school. Although a student who transferred to 
another district may have eventually dropped out, there is no way to determine this, and 
as has been previously recommended (Ensminger & Slusarcick, 1992; Marrow, 1986), 
valid transfer students are designated as on-time graduators. 

A record of a transcript request from an alternative school was defined as a non- 
valid indicator of student transfer for on-time high school graduation, and thus was an 
indicator of the educational challenges faced by the student with a high probability that 
the student would not graduate on-time with their cohort. Lacking confirming graduation 
or alternative degree completion data from the alternative education schools, it can not be 
determined if the students who transferred to alternative education programs graduated 
on-time with their cohort with a full high school diploma, rather than a G.E.D. It is the 
case that many students who transferred to alternative high schools had low or failing 
grades in multiple subjects at the time of the transfer. Past research on the G.E.D. option 
has shown that it is not equivalent to a regular high school diploma (Cameron & 
Heckman, 1993; Tyler, 2003) and thus is not considered for this study as on-time 
graduation with a standard high school diploma. Even if these students did graduate from 
an alternative high school with a diploma or an alternative high school degree (G.E.D.), 
this study is focused on the on-time graduation of the cohort of students in a traditional 
high school program, and so thus will consider students who transferred to an alternative 
education program as NOTG. Interestingly, it has been shown previously that students 
identified as “at risk” for dropping out are often directed to an alternative education 
program by district personnel before they drop out (Sipple et al, 2004), and that the 
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inclusion of a GED option may encourage students to drop out (Tyler, 2003). If a 
student’s fde did not contain a record of a diploma award, a request for student records 
from another district, or the record ended prematurely, that student was designated as not 
on time graduation (NOTG). Thus, NOTG should be considered as a “proxy” for student 
dropout that may contain some unknown degree of false positives; students who are 
categorized as NOTG but did graduate on time. 

Statistical Analysis 

All data entry and statistical analyses, except for cluster analysis (see below) and 
calculation of confidence intervals between correlations (see Appendix B), were 
conducted using the statistical software package SPSS 14 (SPSS, 2006). For the purposes 
of statistical analysis in this study, subject specific grades are considered as ordinal 
variables while GPA, state standardized test scale scores and ACT subject specific and 
composite scores are considered as interval scales. Thus, when correlating subject 
specific grades to other measures, a nonparametric statistic, Spearman’s Rho, is utilized. 
However, when correlating interval measures, Pearson product moment correlations are 
used where indicated (Howell, 2002). To calculate confidence intervals between two 
independent correlations, Fischer’s r to z transformation and p was utilized (Howell, 
2002) (see Appendix B). 

Cluster Analysis 

To test the grade patterning hypothesis of the ability of past student grade patterns 
to predict future student outcomes based on current student grades, cluster analysis was 
used to identify the underlying patterns within the K-12 grading dataset. To supply the 
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clustering procedure with ample data as well as the subsequent analysis of the results, the 
entire grading histories reeorded within the dataset were ineluded where appropriate. 

Cluster analysis is a deseriptive statistieal analysis that brings empirieally defined 
organization to a set of previously unorganized data (Anderberg, 1973; Risen et al, 1998; 
Jain & Dubes, 1988; Lorr, 1983; Reneher, 2002; Romesburg, 1984; Sneath & Sokal, 
1973). There are two types of elustering, supervised and unsupervised. Supervised 
elustering begins with a defined set of assumptions about the eategorization of the data, 
while unsupervised elustering assumes nothing about the categorization and is designed 
to statistieally diseover the underlying strueture patterns within the dataset (Kohonen, 
1997), a proeedure well suited to diseovering the underlying patterns within student 
grades. While there are many types of unstruetured eluster analyses (Anderberg, 1973; 
Lorr, 1983; Romesburg, 1984; Sneath & Sokal, 1973), this study will foeus on 
hierarehieal eluster analysis, due to the proeedure’s ability to diseover a taxonomie 
strueture within a dataset effieiently (Lorr, 1983; Reneher, 2002; Romesburg, 1984; 
Wightman, 1993). 

Hierarehieal elustering provides a way of organizing eases based on how similar 
the values for the list of variables are for eaeh ease. In hierarehieal elustering, eaeh ease is 
first defined as an individual eluster, a series of numbers for eaeh variable on that ease. 

As an example, this eould be a single student’s grades in all subjeets K-12. Then, at eaeh 
level of elustering, the two most similar eases are joined based on how similar the pattern 
of numbers is for both oases, as defined by a similarity distanoe measure, diseussed 
below. This oontinues in a hierarehieal fashion as similar eases are joined to elusters and 
olusters are themselves joined to similar elusters, until the elustering algorithm defines 
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the entire dataset at the highest hierarehieal level as one eluster (Anderberg, 1973; Risen 
et ah, 1998; Lorr, 1983; Reneher, 2002; Romesburg, 1984; Sneath & Sokal, 1973). Thus, 
when eomplete, eases that were previously organized just as a pseudo-random deseriptive 
list, organized alphabetieally or by student numbers, are plaeed nearby other eases in the 
list with whieh they have a high similarity, aiding in visualization and identifreation of 
empirieally defined patterns previously unknown within the dataset. To date, while few 
studies in edueation use elustering, those that have deseribe their elustering results in 
many varied ways (Sireei et al, 1999; Wightman, 1993; S. Young & Shaw, 1999). 
Deseriptions range from unintuitive, to verbose, to diffieult to interpret. One way to help 
visualize the organization of the data by hierarehieal elustering is to draw a eluster tree, 
sometimes referred to as a dendrogram (Risen et ah, 1998; Rorr, 1983; Romesburg, 

1984). Within a eluster tree, elusters of eases and elusters of elusters ean quiekly be 
identified by the eloseness of lines eorresponding to oases and linked to other oases. The 
unit length of the line indioates similarity of patterns, the distanoe in the data spaoe 
between the two elusters in the units of the measure, with a shorter line denoting higher 
similarity. 

Reoently, researohers in the biologieal soienoes, speeifieally moleeular biology, 
where the human genome projeot has produoed massive amounts of data, have made 
innovations in using and visualizing hierarehieal elustering. Confronted with unordered 
and unintuitive displays of datasets that inolude tens of thousands of genes with 
thousands of data points for eaoh gene in multiple samples, traditional teohniques are 
unworkable. One quiekly adopted innovation was the Risenplot. Pirst invented by 
Miehael Risen at Stanford, the Risenplot takes tables of elustered numbers, whieh the 
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human mind can not easily interpret for pattern recognition, and converts the table into 
blocks of color, aiding the human eye in visualizing patterns within clustered data (Risen 
et ah, 1998). In addition, while traditional statistical program packages do include 
clustering algorithms, such as SAS (using PROC CLUSTER) and SPSS, due to the 
explosion of genetic data and the near ubiquitous use of hierarchical clustering by 
molecular biologists, clustering programs and visualization software are now also freely 
available on the internet (DeHoon et al, 2004; Risen, 1998; Risen & DeHoon, 2002; 

Vilo, 2003). 

Because cluster analysis has been rarely used in educational research, a simplified 

example of the clustering procedure is informative to detail the process, and the 

algorithms used. To allow for initial visualization in two dimensional space, and to keep 

the example relatively brief, the example data set includes five fictitious students, 

numbered 1 through 5, with 8**^ grade English and Mathematics grades {Table 2). 

Table 2: Example 5'* Grade Dataset, English and Mathematics for Letter 
Grades for 5 students 


Student ID 

English Grade 

Mathematics Grade 

I 

D 

D 

2 

B+ 

A 

3 

A 

C 

4 

A 

C+ 

5 

D 

c 


Visual inspection of the letter grading patterns between the five students is difficult, as 
the list is ordered only by the student number. Imagine if the data set were to include the 
data of an entire cohort with grades in many more subjects for multiple years. Discerning 
patterns in the grading data would be impossible without the aid of a clustering method. 
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The example elustering method detailed here is an adapted version of 
Romesburg’s overview of eluster analysis (1984). I substitute subjeet speeifie grading 
data into this analysis, along with the addition of an Eisenplot as the final step in eluster 
visualization, whieh is not diseussed by Romesburg. 

Hierarehieal eluster analysis for use with grade data first requires that student 
letter grades be eonverted to a four point seale {for the conversion scheme, see Table 1). 
For the example hypothetieal data, the letter grades data in Table 2 are eonverted to 
numerie grades data in Table 3. 

Table 3: Example 5'* Grade Dataset, English and Mathematics Numeric 
Grades for 5 students 


Student ID 

English Grade 

Mathematies Grade 

1 

1.0 

1.0 

2 

3.6 

4.0 

3 

4.0 

2.0 

4 

4.0 

2.3 

5 

1.0 

2.0 


This dataset ean be visualized in two dimensions using a seatter plot {Figure 5). 
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Data Space 
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Figure 5: Scatter plot of example data set 


In the Figiue 5 scatter plot, each student is identified by student ID, and each student’s 
numeiic giade {Table i) is plotted in the two dimensional “data space”, which is an 
intersection of the English and Mathematics grade data. From this plot, the proximity of 
each student’s data pattern in the data space can be visualized. If the dataset were to 
include himdieds of student cases, rather than five, and himdreds of subject specific 
gr ades, rather than two, visualization in this mamier would quickly become impossible as 
the number of cases would overlap and the number of dimensions of data would surpass 
thiee, and thus become impossible to visualize. Cluster analysis, and the resulting 
visualization techniques, enables the quantification of case proximity withui a multi- 
dimensional data space as a measme of similarity between cases, as well as dendiogram 
and Eisenplots to aid in pattern recognition and visualization of the clustering results. 

In cluster analysis, the goal is to deteimine quantitatively how similar each case’s 
data pattern is to eveiy other case, cluster the two most similar cases’ data patterns, and 
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repeat the proeess until the entire dataset is defined as a single eluster, thus determining 
the hierarehieal data strueture within the data. This is aeeomplished through the following 
eight steps: 

1 . Create a resemblanee matrix by ealeulating a distanee measure between every 
ease. 

2. Combine the two most similar oases into a eluster. 

3. Use a olustering algorithm to reoaloulate the resemblanee matrix. 

4. Iterate over steps 2 and 3 until all of the eases are olustered into one eluster, e.g. 
n-1 times. 

5. Rearrange the order of the oases on the basis of their similarity aeeording to the 
results of step 4. 

6. Draw the dendrogram. 

7. Draw the Eisenplot. 

8. Interpret the olusters. 

For step I, a distanee measure between every point in Figure 5 must be ealeulated. 
This oan be aeeomplished through a variety of methods (Anderberg, 1973; Forr, 1983; 
Renoher, 2002; Romesburg, 1984; Sneath & Sokal, 1973). To present a simplified 
example, the Euolidean distanee will be used for the hypothetioal dataset. The distanee 
between eaoh ease oan be represented as a dashed line, as shown in Figure 6. 
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Data Space 



Figure 6: Right triangles drawn between each example data point in the data space 

The length of each line is a measiue of the similarity of each case to each other case in 
the two dimensional gr ade data space. To calculate the length of each line, the 
generalized Pythagorean theorem (Euclidean distance) can be used in which each line is 
considered the hypotenuse of a right triangle and the leirgth of the hypotenuse is 
detennined thr ough the forinirla a +b =c . The more general form of this eqiration, the 
Euclidean distance, for any two series of uirmbers in which x = { xi, X2, . . . , Xn }aud y = 
{ yi, y2, • • • , yn } is defined as; 

Equation 1 
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Using Equation 1, a lower left triangular resemblanee matrix ean be generated for the 
example data, as shown in Table 4. 


Table 4: Example data resemblance matrix, step 1, iteration 1 

Student ID 

Student ID 1 2 3 4 5 

1 

2 4.01 

3 3.16 2.03 

4 3.28 1.70 0.33 

5 1.00 3.33 3.00 3.02 


Eaeh eell in Table 4 is the ealeulated Euelidean distanee, using Equation 1, between eaeh 
of the five students in the data spaee in grading units. At this point, eaeh student is 
eonsidered a eluster, and in the subsequent steps, will be grouped into larger elusters with 
students who have similar data patterns. 

The seeond step is to eombine the two oases with the shortest distanee into a new 
eluster. The smallest distanee measure in the example is between student 3 and student 4, 
0.33 {Table 4). This is intuitive from the relative distanee seen between these two oases in 
Figures 5, 6 and 7. Students 3 and 4 are “olosest” in the data spaee, and so should be the 
first two oases to eluster together. In this fashion, the first eluster is defined as eluster 34, 
and Figure 6 ean be updated to show this eluster, as in Figure 7. 
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Data Space 



Figure 7: Example data set, cluster 34 defiued 


The third step is to employ the use of a clustering algorithm. Many algorithms 
have been suggested in the literatiue, however the overage liukoge method is known to 
provide good results and is accepted as a standard clustering algoritlun (Eisen & DeHoon, 
2002; Loit, 1983; Rencher, 2002; Romesbiug, 1984; Sneath & Sokal, 1973). It is the 
clustering algorithm utilized for this study. For average linkage, if d(x,y) is equal to 
Equation 1, the distance between any two clusters A and B is defined as the average 
distance of the total number of cases within both clusters, uaUb, between the total number 
of cases in cluster A, ua, and the total number of cases in cluster B, ub, such that: 


D(A,B) = —Yf^d{x„y,) 

i=l r=l 


Equation 2 
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where the sum is over all of x, in A and all of y, in B. For each step of the clustering, the 
two clusters with the smallest distance are joined and the resemblance matrix is 


recomputed according to Equation 2. For the example then, the updated resemblance 
matrix is shown in Table 5. 

Table 5: Example data resemblance matrix, step 3, iteration 2 


Cluster ID 

Cluster ID 1 2 5 34 


1 

* 



2 

4.01 

* 


5 

1.00 

3.33 


34 

3.222547 

1.863914 3.009212 

* 


The two clusters with the smallest Euclidean distance as calculated using Equation 2 are 
cases 1 and 5. These two cases are then combined into cluster 15. The data space may 
now be represented as in Figiue 8. 

Data Space 



Figure 8: Example data set, cluster 34 and 15 defined 
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Repeating step 4, the resemblance matrix is recalculated using Equation 2. The matrix 
results are shown in Table 6. 


Table 6: Example data resemblance matrix, step 3, iteration 3 

Cluster ID 

Cluster ID 2 34 15 

2 

34 1.86 

15 3.67 3.12 


From Table 6, the two most similar clusters are clusters 2, and 34 with a distance of 1 .86. 
These two clusters are combined into a new cluster, 234. The data space can now be 
represented as in Figiue 9. 
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Figure 9: Example dataset, cluster 234 and 15 defined 
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The final iteration of the clusteiing algorithm to update the resemblance matrix gives the 
data in Table 7, and the entire example data set is included in the final cluster, 12345. 
Table 7: Example data resemblance matrix, step 3, iteration 4 

Cluster ID 

Cluster ID 15 234 

15 * 

234 3.30 * 


Therefore, tluough these steps, the imordered list of student IDs can now be 
reordered by the similarity of each example student’s grade pattern iu 8*** grade English 
and mathematics, as 1,5, 2, 3, and 4. Clirster analysis also provides a means to visiralize 
this order, and the relative magnitirde of the difference or similarity between chrsters, 
known as a dendrogram, or a clirster tree (Eisen et al., 1998; Rencher, 2002; Romesbiug, 
1984), as shown in Figme 10 for the example data set. 

Iteration 1 Iteration 3 Iteration 4 

I Iteration ^ | | 
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w 
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2 
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5 


Cluster Tree 
(Dendrogram) 


0.0 1.0 2.0 3.0 4.0 

Euclidean Distance in Numeric Grades 

Figure 10: Example dataset dendrogram 
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The cluster tree “grows” with each iteration of the clustering algorithm. The x- 
axis represents the Euclidean distance between each data point in the data space in 
numeric grade units. The y-axis represents each student cluster. The length of each 
horizontal line in the tree is defined by the distance measure calculated at that iteration 
for that cluster. So, for the most similar cluster, 34, which was defined during the first 
iteration, the distance is 0.33 {Table 4). The next most similar cluster is cluster 15, with a 
distance of 1.0 {Table 5). Clusters 34 and 2 are linked in the tree at height 1.86 because 
these two clusters were the next most similar at iteration 3 {Table 6). The final height of 
the tree is defined by the final calculation in the resemblance matrix in iteration 4, 3.30 
{Table 7). Thus, the dendrogram allows for the visualization of the order and magnitude 
of the similarity of each student, based on the clustering of each student’s grade pattern 
within the multi-dimensional data space. 

Step 7 is a more recent addition to cluster visualization, the inclusion of an 
Eisenplot, pioneered in molecular biology and cancer research (Bowers et al, 2000; 
Eisen et ah, 1998; van'tVeer et ah, 2002; Weinstein et al., 1997). In this step, each 
student’s grade patterns are converted into blocks of color, aiding the human eye in 
pattern identification across multiple cases and multiple data patterns (Eisen et al., 1998; 
Weinstein et al, 1997). Thus, multiple images in this dissertation are presented in 
color . In addition, categorical data that may be informative in interpreting clustering 
patterns can be visualized along with each case’s data pattern. For the example data, 
adding a hypothetical categorical variable such as “on time graduation” to the data 
presented in Table 3 would result in the data show in Table 8. 
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Table 8: Example 8*^ grade dataset, English and mathematics numeric grades and one 
categorical variable 


Student ID 

English Grade 

Mathematics Grade 

On Time Grad 

1 

1.0 

1.0 

0 

2 

3.6 

4.0 

1 

3 

4.0 

2.0 

1 

4 

4.0 

2.3 

1 

5 

1.0 

2.0 

0 


The clustered data can then be represented, along with the categorical variables, in a 
manner that allows for visualization of the clusters, as well as the data patterns of each 
case and the relation of categorical variables to the cluster patterns. 

As suggested by Risen and others, an Eisenplot should display cases as rows and 
data categories as columns, such as subject specific grades (Risen et ah, 1998; van'tVeer 
et ah, 2002; Weinstein et ah, 1997). Rach data point is represented by varying intensities 
of color blocks, according to a heat-map. Por this study, the heat-map will range from a 
deep red for the highest scores, to a grey for the middle scores, to a deep blue for the 
lowest scores, {Figure 11, scale). An Eisenplot for the example fictitious data presented 
in Table 8 is shown in figure 1 1 . 
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Figure 11: Dendrogram and Eisenplot of the example dataset 


Figiue 1 1 combines all of the data from Table 8 as well as the cluster analysis 
presented in Figme 9, mto a single figme wliich shows the hierarchical clustermg 
stnictm e of the data with a dendiogi am and a cluster-ordered list of cases (Figure 11, 
left), a color-coded representation of the data patterns for each case (Figure 11, center 
color blocks), and a representation of the categorical variables m which a black block 
indicates the presence of the on time gr adiration variable, and a white block the abserrce 
{Figure 11, right). Hence, an Eiserrplot is a figiue which allows for chrster analysis 
interpretation throirgh the presentation of all of the data in the entir e data set ordered 
throirgh hierarchical chrstering. Figiue 1 1 shows that for the example data set, students 3 
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and 4 were the most similar in their English and Mathematies grades followed by student 
1 and 5. Student 2 was the most dissimilar of the data set, but was more similar to eluster 
34, than to eluster 15 {Figure 11, left). Cluster 234 seored higher than eluster 15 overall 
in English and Mathematies {Figure 11, color blocks), and for this fietitious example data 
set, on time graduation was assoeiated with generally higher grades in English and 
Mathematies {Figure 11, right). Thus, this example has shown how the use of 
hierarehieal eluster analysis ean order an unordered list of eases based on the similarity of 
the data patterns of those eases, and display that information in an interpretable and 
intuitive data display. However, the example presented above is a simplifieation of the 
eluster analysis method used in this study, namely through the use of Euelidean distanee, 
and was presented in this primer beeause the Pythagorean Theorem is readily understood 
and produees easily interpretable results. 

The hierarehieal elustering strategy employed in this study differs from the above 
example in two ways, standardization of seores and the use of uneentered eorrelation as 
the distanee measure. Overall, the steps of the elustering method parallel the above 
detailed method. First, the data matrix Y was obtained whieh eontained the data for all 
four eohorts of students with every subjeet speeifie grade, K-12: 



Equation 3 
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in which y'i is an observation vector corresponding to each student case, and y(j) is a 
column corresponding to subject specific numeric grades, converted from letter grades as 
detailed above. Second, each y(j) was normalized through z-scoring, so that the data in the 
entire matrix Y was replaced with z-scores based on the means of each subject specific 
and grade-level specific column, y(j). This step is recommended to control for 
overweighting in the clustering algorithm by arbitrary cases (Rencher, 2002; Romesburg, 
1984). Third, publicly available online clustering software was used to cluster the data 
(Vilo, 2003). The distance measure employed was uncentered correlation, which differs 
from the above hypothetical example. A correlation based measure has been 
recommended in the literature as superior to Euclidean distance (Rencher, 2002) and is 
commonly used in hierarchical clustering (Risen & DeHoon, 2002). The most commonly 
used correlation based measure is the Pearson product moment correlation coefficient, in 
which for any two series of numbers x = { xi, X 2 , . . . , x„ }and y = { yi, y 2 , . . . , yn } 


r 


1 f X . - X 

- 2 . ^ — 



Equation 4 


The Pearson product moment correlation is where x is the mean of the values of series x, 
y is the mean of the values of series y, cr^ is the standard deviation of series x, and cr^ is 

the standard deviation of series y. However, a modified version of the Pearson product 
moment correlation is known as uncentered correlation and is defined as: 
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in which 
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Equation 5 






Equation 6 



Equation 7 


The function defined in Equation 5 is highly similar to the Pearson correlation in 
Equation 4, except that it assumes that the mean is 0 for every series even when it is not. 
This is important when considering two vectors, x and y, that have the same shape but are 
separated by a constant value. The Pearson correlation (a centered correlation) would be 
the same for these two vectors, namely 1 , while the uncentered correlation for these two 
vectors would not be 1 (Anderberg, 1973; Eisen & DeHoon, 2002). Stated in different 
terms, “the uncentered correlation is equal to the cosine of the angle of two n- 
dimensional vectors x and y, each representing a vector in n-dimensional space that 
passes through the origin” (Eisen & DeHoon, 2002, p.l 1). It is this uncentered correlation 
which was used to calculate the distance measure for the hierarchical clustering in this 
study. This required the use of a modified parametric statistic, uncentered correlation. 
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with semi-ordinal data, grades. This is an appropriate distanee measure for this data based 
on the reeommendations of the eurrent data mining and bioinformaties literature. 
Furthermore, it should be noted that the ehoiee of whieh distanee measure is “best” for 
any partieular applieation is eurrently under eontention (Anderberg, 1973; Ein-Dor et al, 
2006; Risen & DeHoon, 2002; Risen et ah, 1998; Jain & Dubes, 1988; Rorr, 1983; Ru et 
al, 2005; Romesburg, 1984; Shen et al, 2006; Sneath & Sokal, 1973; vandeVijver et al, 
2002; Weinstein et al., 1997; Zapala & Sehork, 2006). Henee, while the question of 
whieh elustering algorithms perform best with subjeet speeifie grades is of interest, it is 
outside the seope of this study. Additionally, as in the example presented above, the 
average linkage elustering algorithm was also employed in this study. Cluster 
dendrogram and Eisenplots were generated as detailed in the example above using 
publiely available online elustering software (Vilo, 2003). 
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CHAPTER V: GRADES AND STANDARDIZED ASSESSMENTS 


Description of Sample 

The sample for this study eonsisted of two entire student eohorts, for two sehool 
distriets, West Oak and South Pine, and ineluded all students who entered either distriet 
on-traek for graduation with their eohort in either 1994 or 2006. The overall deseriptive 
variables for the sample are presented in Table 9. 

Table 9: Descriptive variables and frequencies for all students in the sample 
Overall Study Deseriptive Variables 


Total Number of Students Sampled 

361 

Pereent NOTG^ 

26.3 

Pereent with lEPs 

15.2 

Gender (%) 

Female 

49.6 

Male 

50.4 

Ethnieity (%) 

European Ameriean 

58.7 

Hispanie 

12.5 

Afriean Ameriean 

6.1 

Asian 

1.9 

Multi-ethnie 

2.0 


§ Excludes West Oak 1994 cohort due to lack of non-graduating student data 


From Table 9, overall, 361 students were ineluded in the sample. Females and males 
were almost evenly split, while the ethnie majority of the sample is European Ameriean, 
followed by Hispanie, Afriean Ameriean, multi-ethnie and Asian students. Out of all four 
eohorts ineluded in the sample, 15.2% of the students had at least one year in whieh an 
individual edueation plan (lEP) was ineluded in the student’s file, indieating that the 
student had been reeommended for speeial edueation serviees at some point throughout 
their time within the distriet. The overall graduation rate for the sample was 72.9%, and 
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thus the NOTG (Not On Time Graduation) was 26.3%. However, for the 1994 eohort in 
West Oak, unfortunately the distriet had purged its files at some point in the past of all 
non-graduating students, and thus did not have any student data files for the 1994 eohort 
of students who did not graduate. Henee, the overall graduation rate and NOTG 
pereentages are most likely not valid indieators of the overall sample on-time graduation 
rates when the West Oak 1994 eohort is ineluded. 

To understand better the student demographies of eaeh eohort for eaeh distriet, 
student demographie variables are disaggregated in Table 10 by distriet and year. 

Table 10: Descriptive variables and frequencies by district and cohort year 

West Oak South Pine 

Descriptive Variables 1994 2006 1994 2006 


Total Number of Students Sampled 

36 

105 

130 

90 

NOTG 

...§ 

34.3 

36.9 

12.2 

Pereent with lEPs 

27.8 

17.1 

13.1 

10.0 

Gender (%) 

Female 

44.4 

41.0 

50.8 

60.0 

Male 

55.6 

59.0 

49.2 

40.0 

Ethnieity (%) 

European Ameriean 

83.3 

28.6 

73.8 

62.2 

Hispanie 

8.3 

29.5 

2.3 

8.9 

Afriean Ameriean 

0 

9.5 

2.3 

10.0 

Asian 

2.8 

0 

1.5 

4.4 

Multi-ethnie 

0 

1 

0.8 

5.5 

No Ethnieity Data 

5.6 

31.4 

0 

8.9 


^ Excludes West Oak 1994 cohort due to lack of non-graduating student data 


Due to the vagaries of distriet data eolleetion and retention, while many student’s reeords 
ineluded data sueh as ethnieity, for both distriets, multiple students did not have any 
ethnieity reeorded. This issue with missing ethnieity data was most prevalent for the West 
Oak 2006 eohort, with 3 1 .4% of the student reeords eontaining no information on 
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ethnicity {Table 10). \n addition, the data for the West Oak 1994 cohort includes only 
those students who graduated on time, as described above. 

From Table 10, it is obvious that both school districts are under dramatic 
demographic ethnic shifts, from a European American majority to a more diverse student 
cohort, including many more Hispanic and African American students. Additionally, both 
districts have experienced a decrease in the number of students with records of lEPs from 
1994 to 2006, from 27.8% to 17.1 for West Oak, and 13.1% to 10% for South Pine. 
However, for West Oak, this data is difficult to interpret due to the lack of NOTG student 
data. 

The demographic shift of both communities is made more obvious when the 
United States Census Bureau estimates of demographic populations for both the 1990 and 
2000 census are considered ("U.S. Census bureau", 2007). For West Oak, while the 
overall population was stable, in 1990 94% of the population was ethnically European 
American, 4% Hispanic, 1% African American and 1% Asian. In 2000 for West Oak, the 
percentages changed to 73% European American, 20% Hispanic, 5% African American 
and 2% Asian. For South Pine, a similar trend occurred. The population of South Pine 
grew by 13% between 1990 and 2000. In 1990, 94% of the population was European 
American, 2% Hispanic, 2% African American and 1% Asian. In 2000, for South Pine, 
the percentages shifted to 86% European American, 6% Hispanic, 5% African American, 
and 3% Asian. While the 1990 and 2000 community census data does not directly 
parallel the 1994 and 2006 cohorts by time and sample, it is obvious that the student 
populations and the communities of both districts are experiencing demographic shifts 
over time. 
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Standardized Assessments and Grades 

To begin to address the hypothesis of standardized assessments and teaeher 
assigned grades eonverging over time, standardized assessments and subjeet-speeifie 
grades were eolleeted for eaeh student in the sample. Standardized assessments ineluded: 
the ACT (ACT, 2007), generally taken by a subset of eaeh eohort sometime during the 
11* grade aeademie year; the state’s standardized assessment in multiple subjeets given 
in grades 3, 6, 8 and 10; and subjeet speeifie grades for all grade levels K-12. A brief 
summary of the assessment data, overall and by eohort, is given in Table 11. ACT 
eomposite seores are measured on a seale from 1 to 36, Grade Point Average (GPA) is 
measured on a four-point seale as diseussed in the methods. State test seores were 
measured on a four-point eategory seale aeeording to the following seheme: 4 - “not 
endorsed”; 3 - “endorsed at basie level”; 2 - “endorsed met state standards”; 1 - 
“endorsed exeeeded state standards”. 
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Table 11: Means for assessment data for the full dataset, and by cohort 


Assessment 

Overall 

West Oak 
1994^'^ 2006 

South Pine 
1994^ 2006 

ACT Composite Seore 

19.984 

22.00 

18.61 

20.36 

20.05 


(4.342) 

(4.950) 

(4.149) 

(4.114) 

(4.167) 

% of Cohort who took the ACT 

34.6 

47.2 

34.3 

25.4 

43.3 

10* Grade State Test* 






Mathematies 

2.636 

— 

2.811 

— 

2.513 


(0.901) 


(0.941) 


(0.856) 

Seienee 

2.636 

— 

2.818 

— 

2.500 


(0.838) 


(0.862) 


(0.798) 

Soeial Studies 

3.053 

— 

3.200 

— 

2.947 


(0.844) 


(0.890) 


(0.798) 

Reading 

2.331 

— 

2.444 

— 

2.246 


(0.618) 


(0.664) 


(0.572) 

Writing 

2.522 

— 

2.589 

— 

2.474 


(0.622) 


(0.626) 


(0.618) 

High Sehool GPA 

2.347 

2.641 

2.311 

2.070 

2.626 


(0.909) 

(0.670) 

(0.849) 

(0.984) 

(0.832) 

High Sehool GPA by Subjeet 






Math 

2.035 

2.361 

1.848 

1.947 

2.184 


(1.016) 

(0.897) 

(0.997) 

(1.050) 

(0.995) 

English 

2.252 

2.570 

2.323 

1.960 

2.455 


(1.048) 

(0.939) 

(1.036) 

(1.041) 

(1.029) 

Seienee 

2.098 

2.049 

2.027 

1.960 

2.359 


(1.032) 

(0.853) 

(1.065) 

(1.091) 

(0.954) 


Note: Standard deviations are presented in parentheses below each mean 
§ West Oak 1994 sample only includes students who graduated on-time 
t 1994 state assessment scores not comparable to 2006 
J State test scores reported by proficiency categories 


Examining the data in Table 1 1 in more detail, ACT eomposite seores deereased 
signifieantly for West Oak between the 1994 and 2006 eohorts, t(5 1)=2.61, /><0.05, but 
not for South Pine, t(70)=0.32,/»=0.751. Overall high sehool GPA deereased signifieantly 
for West Oak between the 1994 and 2006 eohorts, t(l 14)=2.06,/><0.05. Conversely, 
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overall high school GPA increased significantly for South Pine between the 1994 and 
2006 cohorts, ^(207)=-4.32,/><0.001. Similar trends were also observed for subject 
specific GPAs. Overall, these data suggest that the 1994 West Oak cohort performed 
better than South Pine on the ACT, but because West Oak has seen a decrease in 
composite ACT scores between the 1994 and 2006 cohorts, while South Pine has 
remained stable. South Pine’s 2006 cohort appears to be outperforming the West Oak 
1994 cohort on the ACT. If this difference is truly a trend in the data attributable to the 
actions of each district, rather than to exogenous variables such as cohort effects, it is 
especially interesting given the similar demographic shift that each district is currently 
experiencing. With ethnically changing populations. West Oak has seen declines in ACT 
scores, while South Pine has maintained stable ACT scores. It would be interesting to 
continue to track these trends between the two districts. 

It should again be noted that the 1994 West Oak cohort did not include any 
NOTG students, so comparisons of grades between 1994 and 2006 for West Oak is 
problematic. However, as will be described below in chapter VI, almost all of the 
students who took the ACT graduated on-time for the three cohorts which contain NOTG 
student data. It will be assumed for this study that the same was true for the West Oak 
1994 cohort, so that while overall GPA may not be comparable for West Oak from 1994 
to 2006, ACT scores are comparable since it appears that students who take the ACT 
generally graduate on-time, and so are included in the 1994 West Oak sample. 

Unfortunately during data collection, it was found that the state standardized test 
data would not be comparable between the 1994 and 2006 cohorts. The state in which 
both districts are located has undergone multiple rounds of state assessment design over 
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the past two decades, especially for the high school assessments, such that both the test 
itself and the reporting methods for the test have changed dramatically. Test scores 
recorded in each student’s file for the 1994 cohort were not reported as either category 
scores nor scale scores, and thus were not comparable to the test scores reported for the 
2006 cohort, which were reported as category scores and scale scores. Thus, one initial 
finding of this study is that for any districts within the state studied, state test scores are 
not comparable over the twelve year time span for the 1994 and 2006 student cohorts, 
because scores are not on equivalent or matched scales. 

While the initial hope to use state standardized test scores to compare to grades 
over time could not be realized, the districts for all four cohorts did record a comparable 
standardized assessment for multiple subjects, namely the ACT. The percentages of each 
cohort which took the ACT are presented in Table 1 1 . South Pine has seen a dramatic 
increase in the percentage of students taking the ACT from the 1994 cohort to 2006. For 
West Oak, the difference in percentages can not be interpreted since the 1994 West Oak 
cohort does not contain any NOTG students. Thus, since state standardized test scores 
can not be used to compare the 1994 and 2006 cohorts, ACT scores will be used instead 
to examine the hypothesis of if grades and standardized assessments are converging over 
time. Using ACT scores is not ideal, since less than half of each cohort took the ACT, 
and almost none of the students who took the ACT were NOTG. The assessment scores 
of this large majority of students who did not take the ACT can not be determined using 
the data collected. While this study will now turn to comparing ACT scores and grades, 
the results will be applicable to only the students who took the ACT in each cohort and 
also were enrolled in the district and had grades recorded. 
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To fiuther explore the ACT data presented in Table 1 1, the subject specific ACT 
scores for each cohort are presented in Figme 12. For all subsequent figiues which refer 
to the subject specific ACT tests, the following abbreviations will be used; MATH - 


mathematics subtest; ENG - English subtest; READ - reading subtest; SCI - science 
subtest. Boxplots were constmcted for each ACT subject specific subtest for each cohort 


and were compared {Figure 12). 
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Figure 12: Boxplots of ACT subject-specific subtest scores by cohort 
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Figure 12 is a set of standard boxplots in whieh the eenter line represents the 
median value for eaeh group, the lower and upper boundary for eaeh box represents the 
border of the first and third quintiles respeetively, the whiskers above and below 
represent the 1 .5 value interquartile range beyond the box, and the eireles represent 
outliers that are beyond the 1.5 interquartile range. For the ACT subtest data, the 
differenees between the distriets and eohorts are similar to the overall ACT eomposite 
averages {compare Table 11 and Figure 12). Speeifieally, while the median ACT seore 
for all four subtests has deereased in West Oak between the 1994 and 2006 eohorts, and 
remained relatively stable in South Pine, the ACT seores for the 2006 West Oak eohort 
are somewhat less variable than all of the other seores in eaeh subjeet {second set of box 
and whiskers from the right, all four panels, Figure 12). Overall variability in ACT 
subtest seores appears to be less for all four eohorts in mathematies and seienee, while it 
is the highest in reading. The subtest data appear to be generally normally distributed 
with few outliers, other than West Oak 2006 mathematies, both South Pine eohorts in 
English, and the South Pine 2006 eohort in reading {Figure 12). Thus, the ACT subtest 
data is generally symmetrie and appears generally normally distributed, and parallels the 
overall trends of the ACT eomposite means. 

Historieally, student seores on a subjeet speeifie subtest of a standardized test, 
sueh as the ACT, are highly eorrelated with the other subjeet speeifie subtests on that 
same test (Brennan et al., 2001; Linn, 1982; Woodruff & Ziomek, 2004) due to test 
design, student ability, student knowledge and test-wiseness (Mehrens & Lehmann, 
1991). In this study, ACT eomposite and subjeet seores are highly eorrelated for the full 
dataset, replieating the previous researeh {Table 12). 
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Table 12: Correlations of ACT composite and subject subtest scores, full dataset 


Subject Test 

Composite 

ENG 

MATH 

READ 

SCI 

Composite 

1.0 





ENG 

0.883*** 

1.0 




MATH 

0.830*** 

0.642*** 

1.0 



READ 

0.910*** 

0.776*** 

0.647*** 

1.0 


SCI 

0.842*** 

0.599*** 

0.725*** 

0.709*** 

1.0 


Note: Correlations are Pearson produet moment correlations, and n = 124 for all correlations 

*** p<0.001 

As seen in Table 12 for the full dataset, ACT subjeet subtests in English, 
mathematies, reading and seienee all highly eorrelate with the overall eomposite seore, 
eaeh exeeeding a eorrelation of 0.8. Correlations between eaeh sub test were also high, but 
to a lesser extent than with the eomposite seore. Speeifleally, the lowest subtest 
eorrelation was between English and seienee, followed by English and mathematies. The 
highest subtest eorrelation was between English and reading {Table 12). These results are 
not surprising given the subjeet matter of the sub tests, in that it is intuitive that English 
and reading would highly eorrelate whereas English and mathematies might not eorrelate 
as highly. The lower eorrelation between English and seienee is interesting, sinee reading 
skill is a eomponent of seienee instruetion (Yore et al, 2003). 

The eomparison of a standardized assessment, sueh as the ACT, to teaeher 
assigned grades is of interest for multiple researeh eontexts (Brennan et ah, 2001; Girotto 
& Peterson, 1999; Linn, 1982; Woodruff & Ziomek, 2004) ineluding the eurrent study 
with a foeus on the possible eonvergenee of grades and standardized assessment systems 
over time. However, historieally in eomparing ACT seores and grades, aetual grades are 
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rarely used. Rather, the ACT eorporation eolleets survey information from partieipating 
students and asks students to self-report their grades in multiple subjeets (Woodruff & 
Ziomek, 2004). While interesting, student self reported grades are a problematie souree 
of information on aetual student grades and reeently have been eritiqued as not aeeurately 
refleeting aetual grades in the subjeets surveyed (Kuneel et al, 2005). The eurrent study 
helps to address this important issue and add to the researeh literature by using aetual 
reeorded teaeher assigned grades in eomparison to ACT seores for every student who 
took the ACT for two eohorts in two distriets eaeh, one of the first studies to do so. 

Teaeher assigned subjeet speeifie grades were reeorded as detailed in the 
methods. To simplify this diseussion, the following eomparisons to ACT seores will 
utilize overall high sehool GPA, subjeet-speeifie GPAs, as well as foeusing on lO*’’ grade 
seeond semester subjeet-speeifie grades. Beeause the ACT was taken sometime during 
the 11*’’ grade aeademie year for the majority of students in the sample, to explore how 
subjeet-speeifie grades eorrelate with ACT seores, 10*’’ grade semester 2 grades provide a 
set of teaeher assigned subjeet-speeifie grades that were awarded to students prior to the 
year in whieh they took the ACT. These grades should refleet the eumulative ability of 
eaeh student in eaeh subjeet as judged by their teaehers, taking into aeeount all of the 
issues of hodge-podge and subjeetive grading detailed in ehapter II. However, to help 
address the issue of individual teaeher bias during 10*’’ grade semester 2, overall GPA and 
subjeet-speeifie GPA are also eompared to ACT seores below. 

Subjeet-speeifie grades were reeorded for eaeh elass taken by eaeh student, and 
elasses were eategorized into subjeets for eaeh high sehool semester and grade level. 
Grades were then grouped by subjeet for eaeh semester and grade level. Subjeet grouping 
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categories were as follows: mathematics, English, science, foreign language, social 
studies, government, economics, band/music, physical education/health, computers, life 
skills, art. The distribution of the types of classes taken during 10**' grade semester 2 
across the full dataset is shown in Figure 13. 



Percent of Students Enrolled per Subject 

Figure 13: Distribution of the types of classes taken during 10‘’’ grade semester 2, full 
dataset 


For the full dataset, during 10**' grade semester 2, over 70% of students took a 
core set of classes that focused on mathematics, English, science and social studies 
{Figure 13). For the remaining subjects, over 20% of the students took a class that dealt 
with a foreign language, physical education, computer or life skills. Classes that focused 
on government, economics, band or art all enrolled less than 20% of the students in the 
dataset. The course names and percentages of students who were enrolled in each specific 
course for each subject grouping during 10**' grade semester 2 are detailed in Appendix A. 
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As with the correlations above of each subject-specific subtest of the ACT with 
the other subtests of the ACT, it is of interest to examine how well grades in each subject 
correlate with each of the other subjects for 10**' grade semester 2. Grades for each 
subject in 10**’ grade semester 2 for the entire dataset were correlated with the grades in 
each of the other subjects using Spearman’s Rho correlations {Table 13). For the core set 
of classes taken by most of the students including mathematics, English, science and 
social studies, highly significant correlations are above 0.5, many above 0.6. The highest 
correlation among the core set of classes is between science and social studies at 0.709, 
the lowest is between mathematics and social studies at 0.536. Among the other subjects 
with an n over 30, many of the correlations also appear fairly high, ranging from about 
0.4 to 0.7. Subjects such as foreign language and computers appear to correlate at about 
0.5 across the core set of classes of mathematics, English, science and social studies. 
Interestingly, band, physical education, life skills, and art do not appear to correlate as 
highly with the core set of classes, ranging in correlations from about 0.3 to 0.5. These 
differences in correlation may be interpreted in several ways. First, because mathematics, 
English, science and social studies are considered a core curriculum and are tested by the 
state test as well as the ACT, teacher grading practices across these subjects may be more 
aligned than with non-core subjects such as band, physical education, life skills and art. 
Additionally, the curriculum and grading practices of subjects such as foreign language 
and computers may be more aligned with the core set of classes than with the non-core. 
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Table 13: Correlations of subject-specific grades for 10^^ grade semester 2, full dataset (Spearman ’s Rho) 
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Note: The n of each correlation is in parentheses below the correlation 

*** p-value<0 001 
** p-value<0 01 
* p-value < 0 05 



A second interpretation may be that student performance in the core subjects is 
similar to their performance in each of the other core subjects. One can imagine that the 
lessons learned on how to negotiate the grading system may be similar in subjects that 
require similar types of participation, homework and assessments, such as mathematics, 
English, science and social studies. However, for subjects that may require a different set 
of skills to demonstrate achievement, participation, homework and assessment, such as 
band, physical education, life skills, and art, student performance may not correlate as 
well with the core set of subjects or with any of the other non-core set of subjects {Table 
13). Unfortunately, due to low n with few of the same students taking classes across the 
non-core subjects, correlations with subjects such as government and economics, as well 
as correlations between the non-core subjects are difficult to interpret. It would be of 
interest for future studies to delve further into these differences, collecting a larger 
sample, to show if across a broader population of students the correlation of grades 
remains fairly high for the core subjects, and lower for the non-core subjects. 

Despite these differences in correlations. Table 13 shows that, for the full dataset, 
student grade performance is similar across core subjects, with significant correlations 
over 0.5, and also is somewhat similar with non-core subjects, with significant 
correlations over 0.3. Historically, since grading data has been thought of as difficult to 
collect, few studies have dealt with the correlation of subject specific grades, often 
lacking the data altogether and relying on GPA or self-reported grades (Kuncel et ah, 
2005). Of the studies that have collected subject specific grades, almost all sample a 
population of students, rather than collect entire cohorts of data (Alexander et al, 2001; 
Brennan et ah, 2001; Girotto & Peterson, 1999). However, as with the correlations of 
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ACT subject subtests, and as shown here in Table 13 for the full dataset, grades in one 
core subject correlate with grades in other core subjects (Brennan et ah, 2001). This 
correlation may be the result of any of the interpretations discussed above, from teacher 
and curriculum assessment alignment, to student acquired skill at negotiating the hodge- 
podge grading system and knowing how to participate, hand-in homework and show up 
for class (Brookhart, 1991; Cross & Frary, 1999). In addition, the high correlation of core 
subject grades may also be due to student aptitude (Jencks & Phillips, 1999) in which 
student innate ability in core subjects influences the grade teachers assign. However, no 
matter the interpretation of the correlations, for the data presented for the full dataset, if a 
student’s grade is high or low in one core subject, that same student’s grade in another 
core subject is likely to be very similar, as is the case with correlations of ACT subtest 
scores. This implies that for the core subjects of mathematics, English, science and social 
studies, student grades and ACT test performance depend more on the student than on the 
specific subject, in that student achievement appears to be somewhat subject independent. 
This result implies that to examine the main hypothesis for this chapter of the correlation 
of grades and standardized assessments, it would be advantageous to examine 
achievement scores in multiple subjects for both grades and ACT simultaneously, rather 
than just GPA or ACT composite scores, to further explore this student cross-subject 
performance result as well as to show if the correlation between grades and ACT scores 
over time has changed. 

Additionally, when 10**' grade semester 2 subject-specific grades are correlated 
with the ACT composite and subtest scores for the full dataset, core subject scores show 
moderate and significant correlations {Table 14), replicating past research which has 
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shown similar moderate eorrelations between grades and standardized assessments 
(Brennan et ah, 2001; Linn, 1982; Woodruff & Ziomek, 2004). Interestingly, as opposed 
to the moderate intra-subjeet eorrelations for 10**' grade semester 2 grades between eore 
subjeets and non-eore subjeets, sueh as mathematies and art {Table 13), eorrelations 
between lO'*' grade semester 2 grades for non-eore subjeets and the ACT eomposite and 
subtests are lower, and mostly not statistieally signifieant {Table 14). This may be due to 
the low n for the number of students in the full dataset who took non-eore elasses and the 
ACT. 


Table 14: Correlations of ACT composite and subtest scores with 10‘^ grade semester 2 
grades, full dataset (Spearman ’s Rho) 


Subject Grades, 10‘'^ 
Grade Semester 2 

ACT 

Composite 

ACT 

MATH 

ACT 

ENG 

ACT 

READ 

ACT 

SCI 

Mathematies 

0.398*** 

(121) 

0.517*** 

(120) 

0.345*** 

(120) 

0.289** 

(120) 

0.294** 

(120) 

English 

0.578*** 

(123) 

0.423*** 

(122) 

0.510*** 

(122) 

0.557*** 

(122) 

0.458*** 

(122) 

Seienee 

0 44t *** 
(121) 

0.451*** 

(120) 

0.334*** 

(120) 

0.420*** 

(120) 

0.370*** 

(120) 

Foreign Language 

0.458** 

(47) 

0.262 

(47) 

0.466** 

(47) 

0.366* 

(47) 

0.373** 

(47) 

Soeial Seienee 

0.548*** 

(113) 

0 499*** 
(112) 

0.378*** 

(112) 

0.515*** 

(112) 

0.502*** 

(112) 

Government 

0.705* 

(10) 

0.375 

(10) 

0.773* 

(10) 

0.452 

(10) 

-0.003 

(10) 

Eeonomies 

-0.051 

( 5 ) 

-0.872 

( 5 ) 

0.158 

( 5 ) 

0.359 

( 5 ) 

-0.296 

( 5 ) 

Band 

0.347 

(28) 

0.171 

(28) 

0.282 

(28) 

0.485** 

(28) 

0.263 

(28) 
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Physieal Edueation 

0.151 

(45) 

0.063 

(44) 

-0.061 

(44) 

0.159 

(44) 

0.253 

(44) 

Computers 

0.296 

(42) 

0.391* 

(42) 

0.234 

(42) 

0.289 

(42) 

0.314* 

(42) 

Life Skills 

0.275 

(28) 

0.117 

(28) 

-0.143 

(28) 

0.355 

(28) 

0.457* 

(28) 

Art 

0.096 

(29) 

0.199 

(28) 

-0.016 

(28) 

0.069 

(28) 

0.013 

(28) 


Note: Correlations are Spearman’s Rho 

*** p<0.001 

** p<0.01 

* p<0.05 

However, this differenee in eorrelation between the eorrelation of eore and non- 
eore subjeet grades versus the eorrelation with ACT subtests {compare Tables 13 and 
Table 14) may also be due to the differenee between what is measured by grades versus 
what is measured by the ACT, in that while the ACT may measure the aequisition of 
knowledge, grades also measure the aequisition of knowledge (beeause they eorrelated 
with the ACT) but also may measure a student’s sueeess at negotiating the soeial 
proeesses of sehooling and the hodge-podge subjeetive grading system. This would 
hypothetieally result in a moderate eorrelation between eore and non-eore subjeet grades, 
whieh is seen in this study {Table 13). For this study, this type of eorrespondenee 
between eore and non-eore subjeet grades is termed a “sueeess at sehool faetor” (SSF), in 
whieh the similar varianee between the grades in two or more different subjeets may be 
attributable to a student’s ability at negotiating sehool as an overall soeial proeess, while 
the non-similarity is attributed to the differenees in the eorrelation between eore and non- 
eore subjeets. The moderate eorrelation between ACT and eore subjeets {Table 14) is 
attributable to the similar knowledge needed for both assessments, but beeause the ACT 
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does not correlate with non-core subjects {Table 14) while core subject grades do 
moderately correlate with non-core subjects {Table 13). As one possibility, this may 
suggest that the ACT measures only one part of grades - knowledge acquisition - which 
is about 25% of the variance in grades (0.5 correlation of ACT and core subject grades), 
while grades may measure knowledge acquisition plus another variable that is not related 
to what is measured by the ACT. This result, in combination with the above finding that 
student grade performance appears to be somewhat subject independent, leads to the 
hypothesis here that this other variable is a “Success at School Factor” (SSF). The 
evidence presented here is admittedly initial evidence only, with the major threat to the 
validity of this argument coming directly from the small and intact samples that are 
biased towards students who take the ACT. This topic of a possible Success at School 
Factor will be further addressed in chapters VI and VII. 

One way in which to test a success at school factor would be to correlate subject- 
specific grades with a standardized test given to a broader population than the ACT was 
given. This would help to include students not included in the above tables, such as 
students who do not graduate on time or who chose not to pursue college. This point can 
be tested with this dataset using standardized state high school test scores for the 2006 
cohorts for both West Oak and South Pine. While the use of the state standardized test 
narrows the student sample to just the two 2006 cohorts, it broadens the type of student 
included, since the vast majority of students took the state standardized high school tests, 
both on-time graduators andNOTG students. This analysis rests on the assumption that 
the standardized state test is similar to the ACT, in that it assesses the extent of student 
academic knowledge. If the state test correlates with core subject grades only, there is 
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support for the hypothesis that teaeher assigned grades may assess both aeademie 

knowledge and a sueeess at sehool faetor. These eorrelations are presented in Table 15. 

Table 15.- Correlations of standardized state high school test scale scores with 10‘^ grade 
semester 2 grades, 2006 cohorts - West Oak and South Pine (Spearman ’s Rho) 


State Standardized Tests 

Subject Grades, 10‘^ Social 


Grade Semester 2 

Math 

Science 

Studies 

Reading 

Writing 

Mathematies 

0.399*** 

0.357*** 

0.291** 

0.272** 

0.161 


( 122 ) 

( 120 ) 

( 122 ) 

(118) 

(124) 

English 

0.373*** 

0.485*** 

0.397*** 

0.452*** 

0.325*** 


( 121 ) 

(119) 

( 121 ) 

(117) 

(123) 

Seienee 

0.392*** 

0.467*** 

0.496*** 

0 444 *** 

0.256** 


( 121 ) 

(119) 

( 121 ) 

(117) 

(123) 

Foreign Language 

0.282 

0.482** 

0.304 

0.354* 

0.517*** 


(38) 

(36) 

(39) 

(36) 

(40) 

Soeial Studies 

0.482*** 

0.566*** 

0.543*** 

0.466*** 

0.262** 


( 112 ) 

( 111 ) 

(114) 

(108) 

(114) 

Government 

0.498 

0.758** 

0.467 

0.543 

0.523 


(13) 

( 12 ) 

( 12 ) 

(13) 

(13) 

Eeonomies 

-0.026 

0.410 

0.821 

0.821 

0.553 


( 5 ) 

( 5 ) 

( 5 ) 

( 5 ) 

( 5 ) 

Band 

0.253 

0.084 

0.156 

0.303 

0.102 


(26) 

(26) 

(25) 

(26) 

(26) 

Physieal Edueation 

0.227 

0.196 

0.189 

0.214 

0.338** 


(57) 

(56) 

(58) 

(55) 

(59) 

Computers 

0.253 

0.212 

0.223 

0.369** 

0.356** 


(54) 

(54) 

(55) 

(53) 

(56) 

Life Skills 

0.496** 

0.349 

0.234 

0.332 

-0.157 


(28) 

(29) 

(28) 

(28) 

(28) 

Art 

0.068 

0.119 

0.083 

0.214 

0.185 


(47) 

(46) 

(46) 

(46) 

(47) 


Note: Correlations are Spearman’s Rho 

*** p<0.001 

** p<0.01 

* p<0.05 


The data presented in Table 15 supplies further evidenee supporting the possible 
existenee of a SSF eomponent in grades. Grades from 10**' grade semester 2 eore subjeets, 
sueh as mathematies, English, seienee and soeial studies, moderately eorrelate with the 
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state standardized test aeross multiple subjeet tests, ineluding mathematies, seienee, 
soeial studies, reading and writing {Table 15). However, the state subjeet tests generally 
do not eorrelate with non-eore subjeet grades, sueh as band, physieal edueation and art. 
Also, Table 15 supplies additional evidenee that both grades and the standardized state 
test seores are independent of the aetual subjeet and appear tied more to eaeh individual 
student. The moderate eorrelations suggest that students who do well or do poorly in one 
subjeet will generally have similar seores aeross the other subjeets assessed. These data 
supply an initial test of the sueeess at sehool faetor, and indieate that grades may measure 
two important faetors in the lives of students, aeademie knowledge and sueeess at sehool. 
If this is true, these results have important implieations for the emphasis on standardized 
tests as the main driver of data driven deeision making in sehools, distriets, states and the 
nation. As will be detailed in ehapter VI, student’s grades ean prediet if a student will or 
will not graduate on-time. Aeknowledging that graduation from high sehool is an 
important predietor of student life outeomes, investing in a better understanding of a 
possible sueeess at sehool faetor and its possible assessment through grades, and non- 
assessment through standardized tests, has deep implieations for sehool leaders engaged 
in data driven deeision making. These issues will be further taken up in ehapter VII. 

The correlation of grades and standardized assessments 

The hypothesis for this ehapter is that grades and standardized assessments may 
be eonverging over time, as diseussed in ehapters II and III. To explore this hypothesis, 
eorrelations of grades and ACT seores between the 1994 and 2006 eohorts for both 
distriets are examined in this seetion. First, eorrelations of ACT seores with overall high 
sehool GPA for both years for both distriets will be presented, then the more detailed 
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subject-specific high school GPAs, followed by the fine grained 10*'' grade semester 2 
subject specific grades. 

The Pearson product moment correlations of ACT composite and subject-specific 
subtest scores with overall high school GPA (HSGPA) varies dramatically across the two 
cohorts in each of the two districts {Figure 14). For West Oak, the correlations of 1994 
{Figure 14, left panel, dashed line) and 2006 HSGPA {Figure 14, left panel solid line) 
with ACT scores are highly similar for both years for the ACT composite, reading and 
science subtests, but vary dramatically and inversely for the mathematics and English 
ACT subtests. The high variation for the 1994 West Oak cohort may be due to the small 
sample size. The South Pine correlations are somewhat more moderated, in that the 
correlations of 1994 {Figure 14, right panel dashed line) and 2006 HSGPA {Figure 14, 
right panel solid line) with ACT scores are relatively similar for the ACT composite, 
mathematics and science subtests, but differ for the English and reading subtests. Overall, 
the correlation of HSGPA the ACT shows a mixed results with both districts showing 
little change between the 1994 and 2006 cohorts for the correlation of HGSPA and ACT 
composite, and multiple differences across the ACT subtests. Confidence intervals for 
these correlations across each cohort for each district all overlap {see Appendix B), 
indicating that there is no statistical difference in the correlation between HSGPA and 
ACT scores for both cohorts in both districts. 
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COMP MATH ENG READ SCI COMP MATH ENG READ SCI 
West Oak South Pine 

Figure 14: Correlation of high school GPA and ACT between the 1994 and 2006 cohorts 
for both West Oak and South Pine (Pearson correlations) 

In addiessiug the question of if the conelation of grades and ACT scores are 
converging over time, the data in Figiue 14 answers the question to some extent. For both 
districts, the conelation between HSGPA and ACT has increased slightly fiom the 1994 
cohort to the 2006 cohoif, but the difference is exceedingly small and not statistically 
significant. Additionally, the differences between ACT subtest scores and HSGPA 
conelations may indicate that the ACT composite or the HSGPA is masking trends that 
are occiming in less aggregated data. In addition, fiom the data presented above in Table 
14, ACT scores do not conelate well with non-core subject grades that an overall 
measme of grades, such as high school GPA, includes. If grades and standardized 
assessments are converging over time, that convergence may only be occiming m core 
subjects, especially core subjects that are tested by standardized tests such as the ACT, 
namely mathematics, English and science. Thus, to delve deeper into the change in 
correlations over tune, subject-specific GPAs in mathematics, English and science were 
corr elated with the ACT subtest scores {Figure 15 and Figure 16). 
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Figure 15: Correlations of high school subject specific GPA with the ACT subtest, full 
dataset 


As in Table 14, subject-specific high school GPAs moderately correlated across 
all of the ACT sub tests for the full dataset {Figure 15) indicating that subject-specific 
GPA correlates similarly to grade-level subject-specific grades, and thus may be a more 
interesting variable to correlate with ACT scores since subject-specific GPA does not 
include grades from non-core subjects like HSGPA does. Subject-specific GPA 
correlation with ACT subtest scores varied between 0.4 and 0.6 and followed the pattern 
described in the tables and figures above in that mathematics GPA correlated higher than 
English and science with the mathematics ACT subtest. A similar trend was repeated for 
the other subjects. As with the data presented above for non-similar subjects, 
mathematics and science GPA correlated the least with the English ACT subtest, while 
English GPA correlated the least with the mathematics ACT subtest, and English and 
science GPA correlated similarly and lower than science GPA with the ACT science 
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subtest {Figure 15). Regardless, the conelations of the fiill dataset do not addiess the 
question of a change in con elation overtime, so a comparison of cohorts is required. 
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Figure 16: Correlations of subject-specific high school GPA and ACT behveen the 1994 
and 2006 cohorts for both West Oak and South Pine (Pearson correlations) 


Conelations for high school subject-specific GPA for mathematics, English and 
science to the ACT subtests were disaggiegated by cohoit and district {Figure 16). As in 
the data presented above, conelation trends for West Oak indicate that the overall 
conelation of grades and ACT scores decreased between the 1994 and 2006 cohorts 
{Figure 16, top panels). For West Oak in specific ACT subtests there were striking 


76 


Bowers. A.J. (2007) 


differences between the 1994 and 2006 cohorts in the correlation of all three subject- 
specific GPAs with the mathematics, English and reading ACT subtests. Specifically, the 
correlation between all three subject-specific GPAs and the ACT English subtest has 
decreased, from the 1994 cohort in which all three correlations were over 0.6, to the 2006 
cohort in which all three correlations were 0.5 {Figure 16, compare all three lines in the 
top left panel ENG column with top right panel ENG column). Also, the correlation 
between English GPA and the ACT reading subtest has decreased between the two 
cohorts. Interestingly, the correlation between all three subject-specific GPAs and the 
ACT mathematics subtest has increased between the 1994 cohort and 2006 cohort, rising 
from lows under 0.4 to highs over 0.5 {Figure 16, compare all three lines in the top left 
panel MATH column with the top right panel MATH column). However, the overall 
patterns for West Oak in Figure 16 suggest that the correlations between subject-specific 
GPA and ACT scores has not substantially increased from the 1994 cohort to the 2006 
cohort across multiple subjects and tests. It must be noted however, that confidence 
intervals for each of the correlations for each of the 1994 cohorts overlaps with the 
comparison 2006 cohorts {see Appendix B). This most likely is due to the small sample 
sizes of just the students in each cohort who took the ACT, but does indicate that all of 
the differences between the cohorts are not statistically significantly different. 

In contrast to West Oak, the data for South Pine indicates that the correlations 
between subject-specific GPAs and ACT scores may have increased between the 1994 
and 2006 cohorts {Figure 16, bottom panels). While the lowest two correlations for South 
Pine were for the 1994 cohort in mathematics and science GPA correlated to the ACT 
English subtest {Figure 16, bottom left panel), the South Pine 2006 cohort appears to 
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have higher eorrelations than the 1994 eohort in the majority of the subjeet-speeifle 
GPAs and ACT subtests {Figure 16, bottom right panel). The evidenee presented here 
shows a general but statistieally non-signifieant inerease in the eorrelation of subjeet- 
speeifie GPAs to ACT seores for South Pine but not West Oak. 

To further explore these differenees in eorrelations, 10**' grade semester 2 grades 
in mathematies, English, and seienee were eorrelated to the ACT subtest seores for both 
eohorts for both distriets using Spearman’s Rho eorrelation {Figure 17 and Figure 18). 

As stated above, lO*’’ grade semester 2 grades are a relevant eomparison to ACT seores, 
sinee students take the ACT during the following aeademie year after the 10**' grade 
semester 2 grades were assigned. For West Oak, the eorrelations aeross multiple ACT 
subtests of teaeher assigned grades in mathematies, English and seienee are highly 
variable and show that generally the eorrelations for the 2006 eohort are below the 
eorrelations for the 1994 eohort {Figure 17, compare dashed lines for the 1994 cohort to 
solid lines for the 2006 cohort). This is most obvious in English and seienee, in whieh the 
eorrelations between English and seienee grades with ACT subtests in English and 
seienee respeetively are lower for the 2006 eohort than they are for the 1994 eohort 
{Figure 1 7, center and bottom panels). Only the eorrelation between mathematies and 
English grades and the ACT mathematies subtest are appreeiably higher for the 2006 
eohort than the 1994 eohort {Figure 17, top and center panels), however for all of these 
eorrelation eomparisons, eonfidenee intervals overlap and thus are not statistieally 
signifieantly different {see Appendix B). 
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Figure 17: Coirelaliovs of West Ook 1994 and 2006 cohort l(f^ grade semester 2 
subject-specific grades in mathematics, English and Science to ACT subtests 
(Spearman 's Rho) 
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Figure 18: Coirelatiovs of South Pine 1994 and 2006 cohort l(f^ grade semester 2 
subject-specific grades in mathematics, English and Science to ACT subtests 
(Spearman 's Rho) 
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Similar to the data presented in Figure 16 above, the eorrelations for South Pine 
between 10**' grade semester 2 subjeet-speeifie grades in mathematies, English and 
seienee to the ACT subtests are somewhat higher aeross subjeets for the 2006 eohort in 
eomparison to the 1994 eohort {Figure 75). For grades in eaeh subjeet examined, 2006 
eohort eorrelations to the ACT subtests exeeeded the eorrelations of the 1994 eohort 
{Figure 18, compare dashed lines to solid lines), however, as with the West Oak data, 
eonfidenee intervals for eaeh eorrelation eomparison overlap indieating that all 
differenees are not statistieally signifieant {see Appendix B). This data suggests that while 
West Oak has not seen an overall inerease in the eorrelation between grades and ACT 
seores. South Pine has seen a slight but statistieally non-signifieant inerease in 
eorrelations; however that inerease is only for the eorrelations between eore subjeet 
grades and the ACT; namely mathematies, English and seienee. 

The researeh question for this ehapter is: to what extent has the eorrelation 
between grades and standardized assessments ehanged from earlier student eohorts to 
more reeent eohorts? Based on data from two distriets for two eohorts, eaeh separated by 
12 years the evidenee is mixed. West Oak has not seen an appreeiable inerease in the 
eorrelation between grades and ACT seores, while the data suggests that South Pine has 
seen a non-statistieally signifieant inerease, at the least for the eore subjeets of 
mathematies, English and seienee. This point must be further eritiqued in that the entire 
burden of the eorrelations presented in this ehapter have rested not on state test seores 
administered to eaeh student (a better but impossible option due to the test reeords 
themselves) but on the ACT seores of a subset of the sample of students from eaeh eohort 
(those who took the ACT). This issue greatly weakens the finding that the South Pine 
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data may support the hypothesis. Additionally, the differenees in the South Pine 
eorrelations between the 1994 and 2006 eohorts are not statistieally signifieant. This may 
be due to the small sample sizes, but also serves to weaken support for the hypothesis. 

However, if one eonsiders this study a pilot study, and the data from West Oak 
and South Pine merely base-line data, as is suggested in ehapter III, then these findings 
are eneouraging and suggest further work. It appears that for South Pine, the eorrelation 
between grades and a standardized assessment may have inereased over time, providing 
initial eonfirming evidenee for the first hypothesis proposed in ehapter III. Caveats to this 
finding, as well as a broader diseussion and suggestions for future work are diseussed 
further in the final ehapter. 


82 


Bowers, A.J. (2007) 



CHAPTER VI: GRADE PATTERNING AND PREDICTION 


Can grades be used for data driven deeision making? This is the primary question 
addressed in this study. The data presented in ehapter V show that grades appear to be 
eonverging with one form of standardized assessment for one of the distriets in this study 
suggesting that they have some potential for data driven deeision making. The initial data 
presented suggest that grades may measure more about a student’s performanee in the 
sehooling system than standardized tests historieally have. Speeifieally, for this sample, 
there is tentative evidenee that grades might be measuring both aeademie knowledge and 
sueeess at sehooling, argued above as two separate eomponents of teaeher assigned 
subjeet-speeifie grades. Henee, rather than being subjeetive and irrelevant measures (as 
mueh of the literature on grading would lead one to believe) grades appear to measure 
these two variables. This, of eourse, is an empirieal question, one worthy of further 
examination. This study now turns to another aspeet of the study, to demonstrate that 
grades ean be used by sehool leaders to make deeisions that positively impaet the lives of 
students. Toward that purpose, this ehapter turns to the next two sets of researeh 
questions; to what extent do previous grading patterns prediet future grading patterns, and 
to what extent are grading patterns predietive of qualitative student outeomes, sueh as on 
time graduation? These are important questions to eonsider in relation to data driven 
deeision making sinee predietion of future performanee at a point early in a student’s 
sehooling allows for interventions by teaehers and administrators, if neeessary, and sinee 
we know that graduating from high sehool is a good predietor of a student’s life 
outeomes (Kienzi & Kena, 2006). If grade patterns are useful in predieting on-time 
graduation or not on time graduation (NOTG) at the earlier stages of sehooling, distriet 
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and school leaders would gain an additional tool for to help identify students who may 
need more focused attention by the district. Since these two research questions deal with 
grade patterns, and the predictive ability of these grade patterns on future student grades 
and qualitative outcomes such as on-time graduation, these two questions will be 
considered in tandem as the data is presented. 

Not On-Time Graduation (NOTG) 

The primary qualitative outcome that this study will focus on is “not on time 
graduation” (NOTG). For this sample, NOTG is used rather than “dropping out”, primary 
because 1) the term and measurement of “dropout” is currently contested in the literature, 
and 2) for the data collected, those students who did not have evidence of on-time 
graduation in their permanent files were assigned to the NOTG category. Because of the 
issues of identifying NOTG students, it must be assumed that some proportion of the 
NOTG students are false positives, most likely resulting from the student having a valid 
transfer to another school district and no record of that transfer existing in the student’s 
files. Despite this issue, the NOTG variable is an indication that the student did not 
graduate on time with their cohort in either of the two districts. While the false positive 
issue is a threat to the internal validity of the eonelusions of this study beeause the 
number of false positives ean not be estimated, NOTG is a reasonable designation given 
that the majority of the students eoded NOTG did have reeords of either non-attendanee, 
refusing to attend the sehool, ineareeration or expulsion. In this way, NOTG, while not a 
“pure” indieation of dropping out, should be eonsidered a reasonable proxy. In addition, 
as mentioned in the methods, an unknown segment of on time graduators may also be 
false positives, due to some students transferring to other sehool distriets and their 
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graduation status becoming unknown. Before discussing student grade patterns and how 
those patterns may predict NOTG, the qualitative outcomes of NOTG and on-time 
graduation for the dataset will be first detailed. 

Students who graduate from high school and receive a regular diploma, on 
average, experience better life outcomes in terms of employment, type of job, and salary 
as well as lower rates of public assistance and incarceration (Dynarski & Gleason, 2002; 
Jimerson et al, 2000; Kienzi & Kena, 2006; Laird et al, 2006; Lehr et al, 2003). The 
research literature to date examining student graduation has focused on large-scale 
estimations of national graduation and dropout rates. For the 2003-2004 school year, the 
United States Department of Education estimated a national graduation rate of 74.3% 
(Seastrom et al, 2006), and that data is supported by other studies that have also 
estimated national average graduation rates above 70% (Greene & Caire, 2001; Greene & 
Winters, 2005). However, other recent studies have begun to reexamine the methods of 
national graduation estimation and have reported national average graduation rates below 
70% (Swanson, 2004). Applying these broader measures of graduation rates, using the 
NOTG data in this study to calculate on-time graduation rates for South Pine, rates have 
increased from 63.1% in 1994 to 87.8% in 2006, while the graduation rate for West Oak 
in 2006 was 65.7%. The graduation rate for West Oak in 1994 can not be calculated since 
the 1994 cohort data files had been purged of all students who did not graduate on-time 
with their cohort, as described above. This high variability over years, as well as between 
districts is reflective of the national debate on average graduation rates for all districts, 
and is thus not unexpected. It replicates the more general national averages and extends 
the findings on graduation rates to the individual district level for this sample. 
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To delve further into the NOTG data, the pereentages of NOTG students 


disaggregated by lEP status, gender and ethnieity is shown in Table 16. 

Table 16: Descriptive variables and frequencies by district and cohort year for students 
who did not graduate on-time (NOTG) 


West Oak South Pine 


NOTG Descriptive Variables 2006 1994 2006 


Pereent with lEPs 

22.2 

16.7 

27.3 

Gender (%) 

Female 

36.1 

35.4 

45.5 

Male 

63.9 

64.6 

54.4 

Ethnieity (%) 

European Ameriean 

28.6 

91.7 

37.5 

Hispanie 

42.9 

5.6 

12.5 

Afriean Ameriean 

28.6 

2.8 

37.5 

Asian 

N/A 

0 

0 

Multi-ethnie 

0 

0 

0 


Sinee the West Oak 1994 eohort only ineludes students who graduated on-time, 
that eohort is not ineluded in Table 16. For the other three eohorts, the data is striking. 
Aeross all three eohorts, of the students who did not graduate on-time (NOTG), males 
eonsistently graduate on-time at lower rates than females, as do Hispanies and Afriean 
Amerieans graduate at lower rates than European Amerieans and Asians {compare Table 
16 and Table 10 overall demographic variables). These findings replieate previous 
studies and extend the findings to the eontext of small first-ring suburbs. Previous studies 
have foeused on large urban distriets, namely Chieago and Baltimore, and have shown 
that the students who most frequently do not graduate on-time are males, Hispanies and 
Afriean Amerieans (Alexander et al, 2001; Allensworth, 2005; Allensworth & Easton, 
2005; Campbell, 2004; Roderiek & Camburn, 1999). An examination of the broader El.S. 
population has shown that, for the U.S. as a whole, on average there is no differenee in 
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on-time graduation rates between females and males, but that on-time graduation rates for 
Hispanies and Afriean Amerieans are mueh lower than for other ethnie groups (Laird et 
al, 2006), as eonfirmed in this study {Table 16). It should be noted that to eoneeal the 
identity of the students and distriets, absolute numbers for eaeh eategory will not be 
diseussed, however, for many of the eategories in Tables 16 and 17, the number of 
students in any one eohort in any one distriet may be only in the single digits. 


Table 17: Descriptive variables and frequencies by district and cohort year for students 
who were retained 


Retained Student Descriptive Variables 

Overall 

West Oak 
2006 

South Pine 
1994 2006 

NOTG (%) 

85.2 

81.8 

100 

84.6 

lEPs (%) 

25.9 

18.2 

33.3 

30.8 

Gender (%) 

Female 

33.3 

18.2 

66.7 

38.5 

Male 

66.7 

81.8 

33.3 

61.5 

Ethnieity (%) 

European Ameriean 

36.8 

28.6 

33.3 

44.4 

Hispanie 

21.1 

14.3 

33.3 

22.2 

Afriean Ameriean 

42.1 

57.1 

33.3 

33.3 

Asian 

0 

N/A 

0 

0 

Multi-ethnie 

0 

0 

0 

0 


Interestingly, the literature to date on dropouts and on-time graduation has 
indieated that student grade retention is a strong predietor of a student not graduating on- 
time (Jimerson et al., 2002; Jimerson et al., 2005; Laird et ah, 2006; Montes & Lehmann, 
2004; Roderiek & Cambum, 1999; Roderiek et al., 2000). For this study. Table 17 
presents data on deseriptive variables for the students retained in the three eohorts for 
whieh NOTG data was available. Students who were retained and were ineluded in this 
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study were students who began 1®* grade at the same time as the rest of their eohort, and 
were subsequently held baek one, or multiple years. Of the students who were retained, 
85.2% of them did not graduate, 25.9% of them had lEPs, 33.3% of them were Female, 
and 66.7% of them were male. If retentions were random and refleetive of the overall 
demographie eharaeteristies of the population, one would expeet student retentions 
disaggregated by ethnie group to re fleet the overall population demographies. However, 
as detailed in Table 17, a disproportionate pereentage of Afriean Amerieans were 
retained, 29.6%, in relation to the overall representation of Afriean Ameriean students in 
the sample population, 6.1% {compare Table 10 and Table 17) and the same trend is true 
for males. These findings again replieate and extend previous findings in the literature to 
the eontext of these two sehool distriets, indieating that student grade retention is a strong 
predietor of NOTG. 

Retaining a student at any grade level is one of the best predietors of dropping out 
(Laird et ah, 2006; Montes & Lehmann, 2004) and thus also NOTG, as shown in Table 
17. The literature on risk faetors that prediet dropping out also inelude many other 
variables that have been tested for the ability to assign students as “at-risk” with the 
purpose of predieting, and ultimately preventing future student dropouts. However, the 
predietive validity of these risk faetors is known to be relatively low (Dynarski & 
Gleason, 2002; Gleason & Dynarksi, 2002). Some of these risk faetors are a single parent 
home, family on publie assistanee, sibling drop out, absenteeism, diseiplinary problems, 
or overage for grade-level, among others. However, individual dropout rates for students 
with eaeh risk faetor have all been shown to be below 10% of the students with that risk 
faetor at the middle sehool level, and below 30% at the high sehool level (Gleason & 


88 


Bowers, A.J. (2007) 



Dynarksi, 2002; Laird et al., 2006; Montes & Lehmann, 2004; Weber, 1989). If many of 
these faetors are eombined using multivariate statisties, the pereentage of students 
identified with the multivariate predietion variable who ultimately drop out, rises to 23% 
at the middle sehool level, and 42% at the high sehool level (Gleason & Dynarksi, 2002). 
Also, failing grades at the high sehool level have been identified as a major risk faetor of 
student dropout (Allensworth, 2005; Allensworth & Easton, 2005). However, all of these 
risk faetors only aeeurately identify a subset of the students who ultimately dropout. 

These studies are limited in that the vast majority of the studies have only 
ineluded data on students at the high sehool level, and to a mueh lesser extent middle 
sehool or earlier. This is problematie. If identifleation of potential dropouts does not 
oeeur until high sehool, the deleterious impaet of these risk faetors over the extended 
period of time before high sehool is not assessed or ineluded when judging early risk 
faetors. The literature on student’s laek of motivation to stay in sehool indieates that the 
deeision to dropout is not based on a single faetor or moment, but rather is the eumulative 
effeet of multiple risk faetors, influeneing the student over long periods of time within a 
distriet (Jimerson et al, 2000). For the distriets in this study, as for many distriets nation- 
wide, early student potential dropout identifleation is eritieally important so that the 
distriet ean intervene. However, distriets laek a eheap and effeetive method for early 
identifleation of potential dropouts, earlier than high sehool, whieh is able to identify a 
high proportion of potential dropouts. Studies using eurrent risk faetors are sueeessful in 
identifying only 30-40% of students who ultimately drop out. These results suggest that a 
distriet would be unable to identify 60-70% or more of the distriet’s potential dropouts, 
and may be providing dropout prevention serviees to a population of students who most 
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likely would not have dropped out (Gleason & Dynarksi, 2002). One assertion of this 
study is that what is needed is an advanee over eurrent risk faetors that uses data that 
already exists in sehools, that is rapid and eheap (Gleason & Dynarksi, 2002), and that 
identifies a higher pereentage of potential dropouts at an earlier stage in sehool than high 
sehool. The examination of grades and grade patterns through eluster analysis meets 
these speeifieations. 

Student patterns of teaeher assigned grades are a potentially rieh data souree 
whieh may have the ability to prediet NOTG in a student’s early sehooling eareer in a 
distriet. This statement, eombined with the possibility that past student grading patterns 
might prediet future student grading patterns, is the foeus of the remainder of the ehapter. 
Additionally, to foeus the diseussion, and remain eentered on the two remaining researeh 
questions, whieh refer to overall dataset patterns rather than on distriet speeifle questions, 
the following eluster analysis of the dataset will inelude results only on the overall 
dataset. Examining eaeh distriet and eohort using the methods detailed below is of 
interest, but is outside the seope of this study. 

Cluster analysis of grades 

Hierarehieal eluster analysis of the entire K-12 teaeher assigned subjeet-speeifie 
grading histories for the full dataset was used to address the two remaining researeh 
questions detailed in ehapter III: to what extent do past grade histories prediet future 
grading histories, and ean past grade patterns prediet student qualitative outeomes, sueh 
as on-time graduation? 

Cluster analysis has been rarely used in edueation. The statistieal method has the 
potential, however, to help define natural patterns within student and sehool-level data 
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that can be informative for data driven decision making. Examining school-wide student 
data patterns to better address school-wide improvement and student needs has been 
suggested in the past (Lortie, 1975; Schmoker, 1999), but an empirically driven statistical 
method to examine such patterns has been lacking for data driven decision making in 
schools. Cluster analysis can serve this purpose. In short, hierarchical clustering can 
address the issue of whether past grading histories prediet future student grades. Onee 
eluster patterns are defined, analysis of eategorieal variables for students, sueh as NOTG, 
ean be eompared to the elusters, examining if grade pattern speeifie elusters of students 
show a relationship with either on-time graduation or NOTG. This type of analysis, of 
first elustering and then examining if pattern groups relate with eategorieal data, was 
pioneered in the biologieal taxonomy field (Sneath & Sokal, 1973). It has more reeently 
gained signifieant popularity in caneer researeh and moleeular pharmaeology as an 
attraetive statistieal teehnique for organizing extremely large datasets in which hundreds, 
if not thousands, of variables are eolleeted on hundreds of patients to help prediet patient 
outeomes and possible intervention strategies (Bowers et ah, 2000; Risen et ah, 1998; 
Kallioniemi, 2002; Lu et al, 2005; van'tVeer et ah, 2002; Weinstein et ah, 1997). In one 
of the earliest eaneer studies, researehers analyzed if 5000 different genes were turned on 
or off in 98 different breast eaneer tumors from different patients (van'tVeer et ah, 2002). 
They then used hierarehieal elustering on this dataset to give organization to the data. 
Onee large patterns of gene expression were defined, the researehers found that speeifie 
elusters of gene expression patterns eorrelated with a poor patient prognosis, indieating 
possible avenues for therapeutie and diagnostie researeh, using the information about 
whieh genes patterned into speeifie clusters for either a good or poor prognosis. This 
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work has recently been extended, classifying previously difficult to classify tumors using 
cluster analysis of genetic patterns (Lu et al, 2005). 

The cluster analysis employed in this study is of a highly similar nature to the 
cancer studies, but uses students and their grades rather than patients and tumor genes. 
The aim is to define specific clusters of student grade patterns from the past which 
predict specific outcomes, such as on-time, or not on time (NOTG) graduation. Specific 
grade patterns may be predictive of the specific NOTG outcome, indicating a use for 
grades in predicting NOTG, as well as demonstrating that early grade patterns predict 
future grade patterns, the two research questions addressed in this chapter. 

In addition to these insights offered by cluster analysis, cluster analysis is a 
descriptive statistical analysis, having fewer of the statistical assumption problems of 
multiple regression which were detailed in chapter II. In cluster analysis, in comparison 
with multiple regression using district-level data, the violated assumptions of 
multicolinarity, variable and case dependence, and nested data are all positives, giving 
the underlying structure to the data that hierarchical clustering aims to uncover. Also, as 
discussed in chapter II, many leaders in schools and districts have little interest in 
generalizing their data to the population mean, the object of inferential statistical analysis 
such as multiple regression and HLM. However, these decision makers are very 
interested in descriptive statistics, which are able to reveal actionable analyses of their 
students in real time. This study aims to show that hierarchical clustering can help leaders 
in this regard. 

The supposition that drives this chapter is that the hierarchical clustering of the 
entire grading histories of the students in the full dataset should be able to define, at the 
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minimum, two clusters: those who graduate on time, and those who do not. In addition, a 
cluster of on-time graduating students should also correspond to students who had taken 
the ACT, while a NOTG cluster would correspond to students who were retained or who 
did not take the ACT, described above in chapter V. 

Hierarchical clustering of grades 

Teacher assigned subject-specific grades for each student K-12 in the full dataset 
were clustered according to the methods and plotted on an Eisenplot {Figure 19). 
Individual students are on the vertical axis, while subjects per grade level are on the 
horizontal axis {Figure 19, center). Z-scored student grades are represented as a heat map 
across both axis, with more intense red indicating a higher z-score grade, while a more 
intense blue indicates a lower z-score grade, with grey indicating the mean and white 
indicating no data {Figure 19, center). Hierarchical clustering patterns are represented in 
a dendrogram (a cluster tree) with the distance measure indicated at the bottom in 
standard deviation units {Figure 19, left). Student grade pattern clusters were then 
compared with the categorical variables, NOTG, took ACT, retained, transferred into the 
district at any time, transferred out of the district at any time to a valid school (as 
described in the methods), female student, attended West Oak, and was part of the 2006 
cohort {Figure 19, right side, black bars). Each of these variables is dichotomous, such 
that a black bar represents the presence of that variable for that student, and no black bar 
represents the absence. School level is indicated at the top along the horizontal axis. 
Grade-level increases left to right and subjects follow a repeating pattern by grade-level, 
from core subjects to non-core subjects {Figure 19, legend). Student data rows that 
contain a series of no data, indicated by a long stretch of white, indicate that there was no 
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data in the file for those grade-levels for that student. If the white row is before middle 
sehool, it ean be assumed that the student transferred into the distriet. Date of transfer in 
ean be derived from when the grade eolor bloeks begin. For data rows whieh eontain a 
streteh of white after middle sehool, the student either transferred out of the distriet or 
was NOTG. The last grade that the student eompleted ean be inferred from the last grade 
eolor bloek that preeedes the streteh of white. 
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Figure 19: Eisenplot of hierarchical clustering of teacher assigned subject-specific 
grades, full dataset. 

Cluster analysis of student grades (following page) indieates that for the full dataset, 
student grade patterns eluster into two main elusters, those who graduate on-time, and a 
high pereentage of students who do not graduate on time (NOTG). Eaeh student is 
aligned along the vertieal axis, with subjeets by grade-level aligned along the horizontal 
axis. This figure is presented in color. Z-seored student grades are represented by a heat 
map, with higher grades indieated by an inereasing intensity of red, lower grades 
indieated by inereasing intensity of blue, the mean indieated by grey, and white indieates 
no data (eenter). Hierarehieal elusters are represented by a dendrogram (left), with a seale 
in standard deviation units for the elusters aeross the hyperdimensional dataspaee in 
standard deviation units (bottom left). Diehotomous eategorieal variables are represented 
by blaek bars for eaeh of the eategorieal variables listed (right) as deseribed in the text. 
The dashed green line through the eenter heat map indieates the division line between the 
two major elusters in the full dataset (eenter). Sehool and grade-level is indieated along 
the top horizontal axis (eenter top). Grade level inereases left to right, starting with 
Kindergarten (K), Elementary ineludes grades 1, 2, 3, 4, 5, and 6, followed by Middle 
Sehool (MS) ineluding grades 7 and 8, followed by high sehool and grades 9, 10, 11 and 
12. Within eaeh high sehool grade-level two separate semesters are represented, with 
semester 1 followed by semester 2. Within eaeh grade-level, subjeets are listed in a 
repeating pattern as follows: K - mathematies, speaking, writing, reading; Elementary - 
ist-^th _ rnathematies, reading, writing, spelling, handwriting, seienee, soeial studies; 6* - 
reading, mathematies, English, seienee, band, soeial studies, physieal edueation, art; 
Middle Sehool - 1 '^ - mathematies, English, seienee, soeial studies, band, physieal 
edueation, health, art; 8**^ - mathematies, English, seienee, soeial studies, band, physieal 
edueation, study skills, art; high sehool - 9**^ - semester 1 - mathematies, English, 
seienee, foreign language, soeial studies, government, eeonomies, band, physieal 
edueation/health, eomputers, life skills, family skills, art. Semester 2 repeats 9* semester 
1. All other high sehool grade levels repeat 9**' grade subjeet patterns. 
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The hierarchical cluster analysis of the full dataset indicates that the grading data 
pattern into two main clusters, those who predominately graduate on-time, and those that 
have a high percentage of NOTG students {Figure 19, dendrogram and dashed green 
line, compare NOTG column above and below the dashed green line). These two clusters 
are over one standard deviation from each other in the hyperdimensional grading 
dataspace {Figure 19, left bottom). Students in the top cluster appear to have overall 
grade patterns that are over the mean for the dataset, as indicated by the majority of red 
grade data blocks, in comparison to students in the bottom cluster who mostly scored 
below or at the mean for the dataset in multiple courses, as indicated by grey and blue 
grade data blocks {Figure 19, center). One way to simplify analysis of Figure 19 is for 
the reader to take a blank sheet of white paper and cover all but the NOTG data column 
on the right {Figure 19, right). With just the one column of NOTG data showing, the 
difference in the propensity of the bottom cluster to contain NOTG students is striking. It 
appears that the hierarchical clustering algorithm performed well and was able to 
distinguish between grading patterns which predict on-time graduation and grading 
patterns which predict NOTG. If the reader then reveals the Took ACT column, the 
pattern becomes more interesting. The vast majority of the students in the top cluster 
(above the dashed green line) took the ACT and graduated on-time, as detailed in chapter 
V. However, the pattern of students in the bottom cluster who took the ACT appears to be 
more of a gradient, decreasing as one moves down the bottom cluster as the number of 
NOTG students increases. These patterns show, that for the full dataset, grades alone, 
when analyzed using hierarchical clustering, are useful for predicting both NOTG and 
ACT participation. Additionally, while grades have historically been viewed as subjective 
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measures of student performanee, for this study, it appears that high grades do prediet on- 
time graduation and ACT partieipation, while increasingly low grades prediet NOTG and 
not taking the ACT, indieating that the lower the grades the more likely the student is to 
not graduate on time and not take the ACT and thus does not appear to have plans to go 
to eollege. Students whose grade patterns are more near the mean over their grading 
histories within the distriets appear to graduate on-time at a somewhat lower rate than the 
top eluster, and take the ACT somewhat more frequently than the rest of the bottom 
eluster {Figure 19 bottom, compare the upper quarter which contains clusters of mostly 
grey patterns, with the clusters lower in the bottom cluster). Overall there appear to be 
three main elusters of data, students whose grade patterns are generally above the mean 
aeross subjeets and grades levels, students whose grade patterns are elose to the mean, 
and students whose grade patterns are below the mean. Interestingly, the eluster analysis 
shows that students whose grade patterns are at the mean and below the mean eluster 
together with a higher proportion of NOTG students than the eluster of students with 
grade patterns eonsistently above the mean. 

If the reader reveals the next eolumn of eategorieal variables, it ean be seen that 
the data for the “Retained” variable eorresponds to the bottom eluster, not taking the 
ACT, and NOTG, as would be predieted given the diseussion of retention above. 
Revealing the next two eolumns, “Transfer In” and “Transfer Out,” these two variables 
indieate either if a student transferred into the distriet at any time, or had a valid request 
for transeripts from another sehool distriet and thus transferred out of the distriet. 
Interestingly, there seems to be little eorrespondenee between transfer in or out status, 
and the upper and lower grade pattern elusters. High mobility has been studied in the past 
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as an indicator of a student’s potential to have low aeademie performanee and to not 
graduate on time (Demie, 2002; Demie et al, 2005; Montes & Lehmann, 2004; Wells, 
2003), however some of this researeh has been reeently eritieized in that other variables 
may explain lower aehievement and not graduating on time rather than mobility, sueh as 
SES (Maehin et al, 2006; Strand & Demie, 2006). For the data presented here, it appears 
that student transfer in or out of the sehool distriets studied is not an indieator of 
performanee or on-time graduation. 

If the reader reveals the next eategorieal data eolumn, the “Female” variable will 
be revealed as an indieation of how gender relates to the elustering of student grade 
patterns. For the entire dataset, it does not appear that females elustered more or less in 
either the upper or lower elusters. However, if the NOTG and Female eolumns are 
eompared, it ean be seen that for the students in the bottom eluster who were NOTG, 
many were also not female, indieating that males did not graduate on time more so than 
females, as was diseussed above. 

The last two eategorieal data eolumns are “West Oak” and “2006 eohort”. 
Through eomparing the blaek bars in these two eolumns, one ean plaee eaeh student 
grade pattern into the four eohorts in the dataset {Figure 19, right-most two columns of 
right-hand panel)'. West Oak 1994 (bar, no bar). West Oak 2006 (bar, bar). South Pine 
1994 (no bar, no bar). South Pine 2006 (no bar, bar). Interestingly, while the overall 
dataset elusters into two main elusters that appear to eorrespond to on-time graduation 
versus NOTG, speeifie eohorts of students from one of the two distriets sub-eluster 
within both the top and bottom elusters as evideneed by a non-random elustered pattern 
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in the black bars for the last two columns. This may suggest specific sub-cluster district 
teacher, curriculum or grading policy effects. 

Therefore, to answer one of the research questions of this chapter of do student 
grade patterns predict a qualitative outcome such as NOTG, the answer for this dataset is 
yes. Additionally, if the question is extended to ask if this method is an advance over past 
methods of identifying students as “at-risk” of NOTG as described above, the cluster 
analysis shows that while only 4% of the students in the top cluster were NOTG (1 of 
every 25), 42% of the students in the bottom cluster were NOTG (1 in every 2.4). Thus, 
when considering the dataset in total, cluster analysis is as efficient a predictor as the high 
school multivariate regression methods cited above by Gleason and Dynarski (2002). 

This finding will be further discussed below, and suggests that this method proposed in 
this study may be considered superior over past at-risk variables for multiple reasons. 

In addition to these longitudinal grading cluster patterns, the two overall clusters 
also indicate a conclusion about the types of courses the students in the top and bottom 
clusters were enrolled in. The horizontal axis can be considered to be clustered in both 
the time and the subject dimensions of the data, in that each grade is listed sequentially 
left to right (clustered by increasing time), and each subject is listed in a repeating order 
within each grade with the core subjects of mathematics, English, science, foreign 
language, and social studies listed first reading left to right before the non-core subjects 
of band, physical education, life skills, and art, among others. Noting the subject 
enrollment patterns from chapter V, this is a logical ordering of core and non-core 
courses. Hence, the horizontal axis can be considered to be clustered according to the 
algorithm of increasing years of schooling and core versus non-core subjects. This 
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ordering of the horizontal axis thus plaees the eore eourse subjeet column s in Figure 19 to 
the left-hand side of each grade-level. Also, since each high school grade level contains 
two semesters of data, a second set of core course subject columns for the second 
semester is in the center of each grade-level column set. In this way, the overall pattern of 
course enrollment for clusters of students can be determined for the entire dataset. 

Interestingly, this pattern in Figure 19 suggests that students in the upper cluster 
receive high grades, graduate on-time, take the ACT, and additionally, take core subjects 
through 11**' grade and to some extent into 12* grade. This is evidenced by the more solid 
columns of red blocks in Figure 19 at the left side and middle of each grade-level, 
indicating that these students took many core subjects, and received a high grade in them. 
For the lower cluster, the difference in the pattern of classes the students took at the high 
school level is striking, especially for the lowest quarter of the lower cluster. While the 
students in the lower cluster appear to have taken core subjects in the 9* and 10* grade, 
at 11* grade the pattern diverges from the upper cluster, and it can be seen that the lower 
cluster students take a much wider variety of subjects. Not surprisingly, a gradient that 
parallels participation in the ACT appears to be at work in the lower cluster, in that as one 
proceeds down from the dashed green line, students appear to have been enrolled in 
fewer core courses in 12* grade, then 11* grade, then 10* grade nearest the bottom of the 
lower cluster. For the dataset, this result may indicate that students who were receiving 
low grades and who had a higher probability of NOTG than the upper cluster were taking 
courses in fewer core subjects at the high school level, especially by 11* and 12* grade. 

The other research question of this chapter asks if past grading patterns predict 
future grading patterns, the Hargris Hypothesis. The cluster analysis provides evidence to 
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support this hypothesis as well. Overall, the answer to the question is yes, but with some 
eaveats. From the data presented in Figure 19, students who are assigned high grades in 
early elementary sehool (grades K, 1,2, and 3) generally reeeive high grades throughout 
their eareer for all four eohorts {Figure 19, upper cluster). The same appears to hold true 
generally for students whose grades early on are at the mean or below the mean, in that 
those students who reeeive low grades early appear to eontinue to reeeive low grades 
throughout their sehooling eareer. Thus, this study provides initial empirieal evidenee to 
support the Hargris hypothesis diseussed in ehapters II and III that early high grades, in 
general, appear to launeh a student into a eyele of motivation and aehievement, while 
early low grades appear to look a student into a oontinual eyele of low grades and 
aehievement (Hargris, 1990). 

Examining speeiflo sub-olusters as illustrative of the overall upper and lower 
oluster patterns in Figure 19 is useful to help understand these two overall patterns. These 
two patterns are termed “high-high” and “low-low”, indioating high grades in early 
elementary and in high sehool with on-time graduation by 12**' grade {Figure 20), or low 
grades in early elementary and low grades by high sehool with a large proportion of 
students NOTG {Figure 21). The high-high eluster pattern, here represented by a sub- 
eluster of 42 students from the upper eluster in Figure 19, indieates that for this dataset, 
students who are awarded grades at and above the mean in early elementary sehool, 
generally go on to earn high grades aeross all subjeets throughout elementary, into middle 
sehool, and throughout high sehool, with eventual on-time graduation {Figure 20). 
Conversely, the low-low oluster pattern, here represented by a sub-oluster of 32 students 
from the lower oluster in Figure 19, indieates that for this dataset, students who are 
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awarded giades at and below the mean in early elementary school, generally go on to 
earn low giades across all subjects throughout elementary, into middle school and 
throughout liigh school, with a high proportion NOTG {Figure 21). Thus, the data for this 
study provides initial evidence which supports the Hargiis hypothsis. 


K Elementary MS 9th 


10th 


11th 


12th 




Figure 21: Low-low student grade pattern snb-clnster, K-high school 


However, the Hargiis hypothesis may be only a general pattern observed for the 
upper and lower clusters in Figme 19. The type of student for whom research has 
historically been lacking is both the student who starts with high giades in early 
elementary school but eventually receives low grades (a “high-low” pattern) {Figure 22), 
and the student who starts with low giades in early elementary school but eventually 
receives high giades (a “low-high” pattern) {Figure 23). It appears that clusters of 
students of these types do exist in tliis dataset. Fiuther examiniation looks at the relation 
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of each cluster of students to NOTG. To address these issues, individual sub-clusters 
from Figme 19 are examined. 

First, the low-liigh pattern will be considered. In the upper cluster of Figure 19, a 
sub-cluster of 1 7 students can be identified in which students had a low-liigh pattern, low 
early elementary grades in grades K, 1,2, and 3, that gradirally rose to the mean in grades 
4 and 5, and exceeded the mean in grade 6 and in middle school {Figure 22). As shown in 
Figme 22, while the overall stirdent grade patterns for the 17 stirdents m the sirb-chrster 
leveled off generally near the rnearr in high school across the chrster, all of the stirdents 
gr aduated on time and most of them took the ACT. Interestingly, the majority of this 
chrster of stirdents attended South Pine, although they belonged to both the 1994 and 
2006 cohorts. Thus, the low-high grade pattern does exist for a subset of the stirdents 
stirdied. While low early grades do appear to generally follow the Hargris hypothesis for 
the firll dataset of locking stirdents into a pattern of low gr ades and a higlier probability of 
NOTG, for the students in the sub-cluster in Figme 22, something happened for them at 
about the time they attended 2”‘* and 3^^* grades. Some common experience might have 
helped them improve so that they earned mean grades across subjects by 4* grade, and 
then continued to improve to grades above the mean in middle school, leveling off 
somewhat above the mean in high school, and ultimately graduating on time. 
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Figure 22: Low-high student grade pattern subcluster, K-high school 
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Similarly, an example of the high-low sub-cluster student gr ade pattern also exists 
in the dataset, in which students from the lower cluster in Figiue 19 were at or just above 
the mean grades in multiple subjects at the elementary level, but then then grades drop to 
below the mean in middle school and high school {Figure 25). This pattern appears to be 
associated with high rates of NOTG and lower rates of ACT participation for all 37 
students in the sub-cluster. Thus, it appears that for this sub-cluster of students who 
achieved at or above the mean in early elementary school, these student’s grades 
eventually begin to fall in late elementary and then throughout middle and into high 
school, with a high propensity to not gradirate on time, nor take the ACT {Figure 23). 
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Figure 23: High-low student grade pattern subcluster, K-high school 

These chrster patterns reflect the z-scored grades of stirdents across each srrbject 
for the frrll dataset. It is of vahre to examine the actrral grades within the patterns of each 
major chrster identified. This is especially important when considering the variables that 
predict “at-risk” of not gr adirating on time. Past resear ch has relied heavily on a student 
receiving a failing grade in one or mirltiple sirbjects (Alexander et al., 2001; Allensworth, 
2005; Allensworth & Easton, 2005; Gleason & Dynarksi, 2002; Montes & Lehmami, 
2004). A failing grade is predictive of not gr adrrating on-time; however failing grades are 
not irsirally given imtil middle or high school. Most of the stirdies on failing gr ades are 
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only able to analyze high sehool data; as diseussed above, identifleation of students as 
“at-risk” at the high sehool level may be many years after a point when intervention is 
optimal. If the student was identified as having trouble at a point early in sehooling, 
intervention might have helped that student join a eluster sueh as the low-high eluster, 
rather than the high-low or low-low elusters. 

For the full dataset, the first failing grade oeeurs for one student at the 6**' grade 
level, and then the frequeney of failing grades slowly rises into the high sehool years. 
Remember that the elusters presented above are based on z-seored grades K-12, so year- 
to-year differenees in grading seale are normalized. In faet, a failing grade in high sehool 
is a similar in z-seores to a B- or a C+ at the early elementary level. Both an F in high 
sehool and a B- in early elementary sehool are at the bottom of the relative seales at eaeh 
grade level. The elementary grade however would also indieate satisfaetory work even 
though the student reeeiving the grade might be at the bottom of the elass in terms of 
performanee. An examination of aetual grades reeeived aeross students’ sehool histories 
help explore these possibilities. 

To examine aetual grade patterns for students in the upper eluster and lower 
eluster in Figure 19, the mean non-eumulative GPA for eaeh grade-level for the full 
dataset were plotted K-12, with high sehool grades by semester 1 followed by semester 2 
{Figure 24). Figure 24 shows that the mean GPA of upper and lower eluster students did 
not eonverge at any time aeross all 17 timepoints. Additionally, the trends begin to 
diverge at and after the grade-level, with the upper eluster maintaining over a 3.0 GPA 
(a “B” letter-grade), while the lower eluster deelined throughout elementary and middle 
sehool, leveling off at a 2.0 (a “C” letter-grade) at the high sehool level. These grading 
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trends are espeeially signifieant eonsidering that 42% of the lower eluster students were 
NOTG, while only 4% of the upper eluster students were NOTG. 



Figure 24: Mean non-cumulative GPA trends for the upper and lower clusters, K-12 


A similar plot ean also be eonstrueted whieh eompares the four above example 
sub-elusters of student non-eumulative GPA, high-high, high-low, low-high, and low-low 
{Figure 25). While the high-high and low-low elusters in Figure 25 eorrespond to the 
trends seen in Figure 24, the high-low and low-high mean eluster GPAs show an 
interesting trend that has rarely been diseussed in the edueation literature, namely 
students who start with relatively high grades but then are awarded lower grades over 
time (high-low), and even less frequently studied, students who start low but then are 
awarded high grades over time (low-high). 
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Figure 25: Meov non-cumulative GPA tretids for clusters high-high, high-low, low-high 
and low-low, K-12 


The overall hends of the foiu' clusters suggest interesting patterns for these foiu 
clusters plotted in Figme 25. First, recall that both the higli-liigh and low-liigh clusters are 
members of the upper cluster in Figme 19, which coiiesponds to on-time graduation, 
wliile the high-low and low-low chrsters are members of the lower chrster in Figm e 19, 
wliich corresponds to mitch higlier rates of NOTG. Both Figme 24 and Figme 25 indicate 
that for this dataset, stirdents whose non-crrmirlative GPAs are below the mean have a 
much higher chance of NOTG than stirdents whose gr ades ar e above the mean {Figures 
21 & 22, compare above and below the gre\> line). Second, the low-high pattern is 
striking, in that the low-high shrdent grades track similarly to the low-low shrdent grades 
in first grade, but then diverge in second grade and continire to rise, passing the mean. 
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and the downward trend of the high-low eluster in the 4* grade. The high-low eluster 
grades start just below the high-high students, but in 4*'' grade begin to deeline, passing 
the upward trending low-high eluster students and falling below the mean by 6**' grade, 
and trending similarly to the low-low student eluster by high sehool. The high-high 
students appear to have been awarded high grades throughout their eareer, while the low- 
low students start with some of the lowest grades in the seale in early elementary and 
eontinue to reeeive low grades throughout their eareer {Figure 25). Again, the grading 
data for the high-high and low-low elusters supports the Hargris hypothesis that past 
grading patterns prediet future grading patterns. However, the high-low and low-high 
elusters, while subsets of the overall upper and lower eluster patterns {Figure 19 and 
Figure 24), eontradiet the Hargris hypothesis. 

For these two elusters, high-low and low-high, two infleetion points in time 
appear to be evident, 2“^* grade for the low-high eluster, and 4‘*^ grade for the high-low 
eluster {Figure 25). For the low-high eluster, kindergarten and grade grades are nearly 
identieal to the low-low eluster. If students were identified as at-risk of NOTG on K-1 
grades alone, then the students in the low -high eluster would be mis-identified. In and 

grade the low-high student eluster is still well below the mean, however their grades 
are trending higher. Interestingly, in 4**' grade the low-high students surpass the mean and 
eross the downward trend of the high-low students, and then eontinue to rise throughout 
middle sehool, deeline somewhat in high sehool, and ultimately graduate on time with the 
high-high students. 

For the high-low students, in H* through 3’^'* grades these students appear to be on- 
traek with the high-high students, and so if identified as at-risk through early elementary 
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grades these students would not be identified as lower eluster students. At 4* grade, the 
high-low student’s grades begin to deeline, fall below the rising low-high student eluster, 
then fall dramatieally as the students enter middle sehool. 

The eentral question for these two elusters is: what was the differenee in the 
experienees of these two groups of ehildren? Obviously, beeause these ehildren’s grades 
patterned similarly to eaeh other in Figure 19, something similar may have oeeurred for 
the ehildren within eaeh of the elusters, and the timepoints are indieated in Figure 25, 
grade for the low-high eluster and 4* grade for the high-low eluster. Additionally, both of 
these elusters eorrespond to the low-low eluster. It appears that something ehanged for 
the low-high students in 2"‘* grade that may have eaused them to diverge from the low- 
low eluster, while something may have happened for the high-low eluster in the 4* grade 
so that the student grade patterns ultimately join the low-low eluster by high sehool. 
Figures 19 and 20 indieate that students from all four eohorts are within both the high- 
low and low-high elusters, both 1994 and 2006 and West Oak and South Pine, suggesting 
that what oeeurred most likely was not due to a eohort effeet. Was the similarity in grade 
patterns due to student aptitude or teaeher assistanee? What ean sehool leaders and 
teaehers do to help more students join the low-high eluster rather than stay in the low-low 
eluster? Can the GPA deeline of the high-low eluster be prevented? These issues will be 
taken up in ehapter VII. 

The analyses reported in this study support Hargris’s hypothesis (1990). 

Generally for this dataset, students who reeeived high grades early in elementary sehool 
eontinued to reeeive high grades, and students who reeeived low grades early in 
elementary sehool eontinue in a eyele of low grading throughout their eareer in the sehool 
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districts. However, these trends appear to be only general trends, sinee the data presented 
above suggest that for a subset of the data, some students who start low do attain high 
grades at the higher grade levels, and these students appear to graduate on time. Another 
subset of students start high in the grades they reeeive, but then reeeive low grades as 
they progress through the system. These students appear to not graduate on time as often. 
This is the first time that student grade patterns have been examined empirieally for entire 
eohorts of students in this way, and the first time that the high-low and low-high patterns 
of student grades have been explieated for sueh a dataset. As with the questions posed in 
the previous paragraph, the implieations of the existenee of these patterns will be 
diseussed below in ehapter VII. 

The data presented in this ehapter thus far suggests that grade patterns ean be used 
to identify students “at-risk” of NOTG. The methods presented here surpass previously 
used identifieation methods in multiple ways, ineluding positively identifying 42% of the 
students who did not graduate on time using eurrently existing data. The grade trends 
appear to stabilize by high sehool. The data presented in Figure 24 and 22 suggests that 
the hierarehieal eluster analysis may produee similar results as those in Figure 19, even 
without the high sehool data ineluded. It appears from the data presented above that the 
grading patterns of the upper and lower elusters vary in elementary and middle sehool, 
espeeially for the elusters of students who do not eonform to the Hargris hypothesis; the 
low -high and high-low students. But, by the end of middle sehool this varianee subsides 
and student grade trends appear to remain relatively stable throughout high sehool. 

With this knowledge, this study ean turn to the question of how effieient a 
predietor are student grade patterns before high sehool. This is an important question for 
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two main reasons. First, as discussed above, early identification of at-risk of NOTG is 
considered desirable by educational leaders engaged in data driven decision making so 
that additional assistance can be directed to students who may not graduate on time. 
Second, the best at-risk prediction variables from the literature are mostly at the high 
school level, including failing grades. In contrast, in the literature, middle school 
prediction variables of students at-risk of NOTG overall are considered to be much less 
accurate than at the high school, with the best methods accurately predicting only about 
23% of the students who eventually do not graduate on time (Gleason & Dynarksi, 2002). 
Thus, a method that identifies students before high school with accuracy higher than 20% 
would be of value. 

To explore these issues, the high school grades were removed from the full 
dataset, creating a K-8 dataset containing the z-scored subject-specific grades for each 
student K-8. This K-8 dataset was reclustered using the same hierarchical clustering 
methods for Figure 19, according to the methods {Figure 26). Similar to the clustering of 
the full dataset above, the K-8 clustering identified two main clusters; students whose K- 
8 grades were generally high across subjects and who eventually graduate on-time 
{Figure 26, center panel upper cluster above the dashed green line) and students whose 
K-8 grades were generally low across subjects and grade-levels and who had a higher 
frequency of eventual NOTG {Figure 26, center panel lower cluster below the dashed 
green line). The cluster dendrogram shows that the data is categorized into these two 
main clusters {Figure 26, left panel), and that students in the lower cluster were more 
frequently NOTG than the upper cluster {Figure 26, right column, black bars indicate 
NOTG). 
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Figure 26: Eisenplot of hierarchical clustering of teacher assigned subject-specific 
grades, K-8 dataset 

Cluster analysis of student grades (following page) indieates that for the K-8 dataset, 311 
student grade patterns eluster into two main elusters, those who eventually graduate on- 
time, and a high pereentage of students who do not graduate on time (NOTG). Eaeh 
student is aligned along the vertieal axis, with subjeets by grade-level aligned along the 
horizontal axis. This figure is presented in color. Z-seored student grades are 
represented by a heat map, with higher grades indieated by an inereasing intensity of red, 
lower grades indieated by inereasing intensity of blue, the mean indieated by grey, and 
white indieates no data (eenter). Hierarehieal elusters are represented by a dendrogram 
(left), with a seale in standard deviation units for the elusters aeross the hyperdimensional 
dataspaee (bottom left). The diehotomous eategorieal variables of NOTG is represented 
by blaek bars (right). The dashed green line through the eenter heat map indieates the 
division line between the two major elusters in the K-8 dataset (eenter). Sehool and 
grade-level is indieated along the top horizontal axis (eenter top). Grade level inereases 
left to right, starting with Kindergarten (K), then Elementary ineludes grades 1, 2, 3, 4, 5, 
and 6, followed by Middle Sehool (MS) ineluding grades 7 and 8. Within eaeh grade- 
level, subjeets are listed in a repeating pattern as follows: K - mathematies, speaking, 
writing, reading; Elementary - Ek 5 * _ mathematies, reading, writing, spelling, 
handwriting, seienee, soeial studies; 6* - reading, mathematies, English, seienee, band, 
soeial studies, physieal edueation, art; Middle Sehool - - mathematies, English, 

seienee, soeial studies, band, physieal edueation, health, art; 8**' - mathematies, English, 
seienee, soeial studies, band, physieal edueation, study skills, art. 
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In contrast to the differences between past NOTG at-risk prediction methods 
using high school versus middle school data, cluster analysis of K-8 grades {Figure 26) is 
almost as accurate as cluster analysis of K-12 grades {Figure 19) in predicting NOTG. 
Specifically, as detailed above, past at-risk prediction methods using regression analysis 
are able to predict at best 42% of the students who would have not graduated on-time 
using high school data (defined as students dropping out), but using middle school data 
only 23% of the students who would have not graduated on-time by the end of high 
school were identified (Gleason & Dynarksi, 2002). However, as previously discussed, 
early and more accurate identification of students at risk of NOTG is desirable. The data 
for this study show that by utilizing K-12 grade data, cluster analysis accurately predicts 
42% of the NOTG students {Figure 19), similar to the literature. Figure 26 extends this 
level of prediction to the K-8 dataset, in which clusters are again identified, and Table 18 
further extends these findings by comparing the findings of Gleason and Dynarski (2002) 
using current at-risk prediction methods with just high school or middle school data with 
the cluster methods presented here. Hierarchical clustering of K-8 data shows that 10% of 
the students in the upper cluster were eventually NOTG (lin 10) {Figure 26) and 40% of 
the students in the lower cluster NOTG (1 in 2.5) {Table 18), using only K-8 data. Thus, 

1 in 2.5 students whose grades pattern similarly to the students in the lower cluster in 
Figure 26 are NOTG. This is an advance over past at-risk predictors {Table 18), 
identifying a method which would allow school leaders to better predict students at risk 
of NOTG before they enter high school, by which time the above data shows that the 
students are relatively stable in their performance and outcomes. 
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To further explore how early eluster analysis is able to prediet NOTG with 
aeeuraey that exeeeds eurrent methods in the literature, additional eluster analyses were 
performed using a K-6 and a K-1 dataset, reelustering the data for eaeh smaller dataset 
{Appendix Q, and the aeeuraey of the predietion of the upper and lower elusters was 
assessed for all four elustered sets of data and eompared to the previous findings of 
Gleason and Dynarski (2002) {Table 18). 

Table 18: Cluster prediction accuracy from grades of NOTG by dataset 


Dataset 

Gleason & Dynarski (2002) 

Lower Cluster 

K-12 

% NOTG 

42% 

42% 

K-8 

% NOTG 

23% 

40% 

K-6 

% NOTG 


30% 

K-I 

% NOTG 


27% 


Cluster analysis of student grades identified two main elusters for all four grading 
datasets, K-12, K-8, K-6, and K-1, in whieh the upper eluster eorresponded to higher 
grades and a lower rate of NOTG and the lower eluster eorresponded to lower grades and 
a higher rate of NOTG {Table 18). The most aeeurate predietion of NOTG was the eluster 
analysis of the K-12 and K-8 datasets. Interestingly, the eluster analysis of the K-6 and 
K-1 dataset aeeurately identified over 26% of the students who eventually did not 
graduate on time. This method, using just kindergarten and grade 1 teaeher assigned 
subjeet speeifie grades, exeeeds the aeeuraey of the best at-risk predietion methods 
deseribed in the literature at the middle sehool level. This is a signifieant finding and will 
be diseussed further in ehapter VII. 
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CHAPTER VII: DISCUSSION 


The data for this study support five main findings when eonsidering grades as 
potentially useful for data driven deeision making by sehool and distriet leaders. 1) 
Tentative findings were detailed that suggest that grades may be an assessment of both 
aeademie knowledge and a sueeess at sehool faetor. 2) Grades and standardized 
assessments may be eonverging over time, a finding only partially supported in one of the 
two distriets studied. 3) Past student grade patterns are useful in predieting future student 
grade patterns, partially supporting the Hargris hypothesis. 4) The Hargris hypothesis 
does not hold true for all students. One eluster of students who reeeive high grades in 
early elementary and middle sehool, earn inereasingly lower grades, and are at risk of 
NOTG. A different eluster of students reeeive low grades in early elementary sehool, but 
seemingly overeame this early defieit to exhibit rising grades throughout later elementary 
sehool and middle sehool, ultimately graduating on time. 5) Student K-12 longitudinal 
grade patterning using eluster analysis is as good, or better, a predietor of students at-risk 
of not graduating on time with their eohort as eurrent at-risk predietors from the 
literature. Overall, these results show that teaeher assigned subjeet-speeifie grades, rather 
than being subjeetive and unreliable measures of student performanee are useful for day 
to day deeisions made by teaehers and administrators. These grades should be used, not 
printed on report eards and then loeked away in sehool basements and forgotten. This 
study shows that grades are useful as assessments of student sehool performanee and are 
useful predietors of future grading patterns and on-time graduation. Beeause of these 
findings, it is argued here that grades ean be used for data driven deeision making by 
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school leaders; informing parents, teachers, principals and central office staff of potential 
future student grades and on-time graduation. 

While the findings of the study are promising, there are multiple issues with the 
validity and generalizability that require discussion. First and most significant is the 
biased and intact nature of the student samples. Students were not selected randomly 
from a large population; rather, two small first-ring suburb districts were selected as a 
sample of convenience, and two cohorts within those districts (the graduating classes of 
1994 and 2006) were selected based on the data available in the student’s permanent 
record folders. The conjecture for this study is that when studying small intact samples, 
similar to the real-world data analysis performed daily by principals and district 
administrators in schools across the nation, it is advantageous to include every student for 
which data exists in a school for the cohorts examined. This eliminates internal validity 
issues due to sample bias, and as detailed previously above, rather than estimating the 
population means through inferential statistics such as linear regression, the actual 
population means for each cohort are known. Thus, it is argued here, that school leaders 
should be encouraged to include every student in their district in data analysis, rather than 
choose a sample. This could have very interesting implications, especially for large 
districts in which student data is warehoused electronically for thousands of children. 
Access to data on the entire population of interest increases the statistical power of 
significance tests and allows principals and administrators the ability to understand the 
data for their students. In this way they don’t have to generalize to the mean for all 
students in the nation in order to say something important about the students in their 
schools. It must be acknowledged that as with any statistic, generalizability beyond the 
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sample is problematic. But generalizability is not an issue when principals and district 
administrators are concerned with their entire sample of students, rather than students 
outside of their districts. Conversely, in generalizing this study to the broader context of 
K-12 education across the United States, this issue of small and intact sample size should 
be taken into account. 

For the four cohorts detailed here, the findings of this study are applicable to and 
inform the two districts of what occurred with the correlation and patterns of grades over 
time with these four cohorts. But do the findings of this study have any relation to other 
schools and districts? This is the classic question for all research. This study should be 
considered a pilot study, with initial but as of yet uncorroborated and unreplicated 
findings. The findings of this study should inform other school contexts due to three 
major factors of the study’s design: two districts with two cohorts within each district, 
and the inclusion of all of the on-file data within each cohort. By including two districts 
with two cohorts separated by 12 years, cohort and district effects influencing the results 
are moderated. However, with only two of each, cohort and district effects must be 
considered as viable explanations for all of the results of this study until confirmed 
elsewhere. Additionally, by including the entire cohorts, rather than a random sample of 
each cohort, internal sample bias due to random sampling is reduced, while increasing the 
overall number of student cases, and thus the power of the overall study. Although these 
methods do help with the external validity of this study, the generalizability of the 
findings presented here are questionable when considered in other contexts. The study 
demands replication. 
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A suggested study to follow this pilot work is to perform a similar study for 
multiple distriets, ineluding all students from multiple distriets in a mid-sized 
metropolitan area. A proposed study sueh as this eould inelude 10 to 20 distriets, and 
thousands of students, with multiple eohorts from eaeh distriet represented. If student 
grade patterning aeross distriets and eohorts in a study sueh as the one proposed appears 
similar to the findings presented here, then this would provide a large boost to the 
generalizability of these findings. In addition, with more students, distriets and eohorts, 
additional elusters of student grade patterns may be diseovered. One eould imagine 
students who do well in elementary sehool but experienee problems beginning in middle 
sehool or high sehool, in addition to the four sets (high-high, high-low, low-high and 
low-low) detailed here. 

Additionally, reeent evidenee has emerged in the broader eluster analysis 
literature for biologieal and bioinformaties seienees whieh has relevanee to eluster 
analysis of edueational data. New studies argue for the inelusion of thousands of oases, 
rather than the average that is used in the biologieal seienees whieh is 100 eases or less 
(Dolled-Filhart et al, 2006; Ein-Dor et al, 2006; Sima & Dougherty, 2006; Sorlie et al, 
2006). The study detailed presently, with 361 eases of student grades, is between the low 
end of eases, about 100, and the argued for 1000 or more eases in the eluster literature. 
This debate will surely eontinue in the realm of the bioinformaties literature. These 
methods should be replieated in a larger and broader eontext of distriets and sehools, sueh 
as the proposed study above, to determine if the results replieate in a different eontext. In 
the next seetions, the researeh questions are revisited, in turn. 
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The Correlation of Grades and Standardized Assessments 

This study has addressed three main researeh questions, and eome to many 
eonelusions. The first researeh question deals with the possibility that grades and 
standardized tests, while eonsidered separate assessment regimes in the past, may be 
eonverging over time due to the inereasing pressures from the aeeountability movement 
at both the state and federal levels, penetrating into the elassroom and modifying 
teaeher’s eurrieulum deeisions and to align with the state standardized tests. This has 
been diseussed and hypothesized in the literature (Busiek, 2000; Carr, 2000; Carr & Farr, 
2000; Porter & Smithson, 2001; Shepard et al, 2005; Streifer, 2004; Waters, 2000). If 
true, sehool leaders have another tool in data driven deeision making through the use of 
grading systems that are eorrelated with and help prediet student state assessment seores. 
Yet this idea that grades and standardized assessment are eonverging over time, and thus 
the eorrelation between the two systems is rising, has not been empirieally tested using 
subjeet speeifie grades prior to the present study. This study presents initial evidenee that 
appears to be mixed on this issue. For one of the distriets. West Oak, it does not appear 
that grades and one standardized test, the ACT, are beeoming more eorrelated, while for 
the other distriet. South Pine, the eorrelations, while not statistieally signifieantly 
different, do appear to be inereasing. This result is not surprising given the known 
variable nature of eurrieulum, instruetion and assessment in sehools and distriets. 
Additionally, the entire differenee or non-differenee between the 1994 and 2006 eohorts 
in either distriet ean be entirely explained as eohort effeets, in whieh the 2006 South Pine 
students were a random oeeurrenee of students whose grades and ACT seores eorrelated 
higher than the 1994 South Pine eohort. 


121 


Bowers, A.J. (2007) 



Additionally, these eorrelation results are questionable beeause the state 
standardized assessment seores eould not be used to eompare the 1994 and 2006 eohorts 
sinee the test seores from the two time points were not on similar seales and the West 
Oak 1994 eohort only ineluded students who graduated on time. So a less desirable 
standardized test, the ACT, whieh was given to only a subpopulation of the student 
sample, was used; this may have also led to a biased and erroneous result. As with all of 
the eonelusions of this study, but espeeially for the hypothesis of the inereasing 
eorrelation of grades and standardized assessments, the results must be replieated in a 
larger setting to better understand if grades and standardized assessments are eonverging. 
In addition, as diseussed in ehapter III, the data presented here is only baseline data for 
two time points. The question remains as to if the inereasing eorrelation is a trend over 
time between these two time points, and if the inereasing eorrelation is due to 
aeeountability pressures, or is due to some other influenee in the sehool. This may only 
be aseertained through additional qualitative studies in whieh teaehers and administrators 
are interviewed and/or surveyed to gain an understanding of what may have lead to a 
potential eonvergenee of grades and standardized assessments. However, the results 
presented here also suggest an interesting follow up study. Sinee grades and ACT seores 
may be eonverging somewhat for one of the distriets but not the other, these two distriets 
may present a natural eomparison for sueh a qualitative study. The different proeesses of 
eaeh distriet and the approaehes to the state standards, the ACT, and grades may be 
different and of interest between the two distriets. A possible qualitative study of this 
differenee eould shed light on how sehool distriets are reaeting and adapting to the 


122 


Bowers, A.J. (2007) 



accountability policies and the pressures of state eurrieulum frameworks and 
assessments. 

A Success at School Factor (SSF) 

The seeond major finding stemming from the first researeh question on the 
eorrelation of grades and standardized assessments is that the data presented here suggest 
not only that grades are useful assessments for eonsideration for data driven deeision 
making by edueational leaders, but also that grades might also measure a Sueeess at 
Sehool Faetor (SSF). The faet that standardized assessments sueh as the ACT and the 
state’s high sehool assessment for these two distriets moderately and signifieantly 
eorrelate with subjeet speeifie grades is not a new finding. Past researeh has shown that 
grades and standardized assessments not only eorrelate (Brennan et ah, 2001; Woodruff 
& Ziomek, 2004), but that ideally, grades should provide eriterion validity for 
standardized tests and thus should at the least, moderately eorrelate with standardized 
tests (Linn, 1982). But though standardized tests are reported to sehool and distriet 
leaders, poliey makers, and the press, with the implieation that standardized test seores 
should be used to drive improvement in sehools (Linn, 2000) (as eodified in the NCLB 
legislation), grades are rarely used for data driven deeision making in sehools. Grades are 
seen as subjeetive and ineonsistent, as demonstrated in the diseussion of hodge-podge 
grading praetiees (Brookhart, 1991; Cizek, 2000; Cizek et ah, 1995-1996; Cross & Frary, 
1999; Linn, 1982; Shepard et ah, 2005). This leads one to eonelude that while the life of 
students and teaehers revolve around eomplianee with and ereation and assessment of 
grades and grading praetiees (Bailey, 1976; Hargis, 1990; Kirsehenbaum et ah, 1971; S. 
Simon, 1976), mueh of this work seems to be ignored by administrators, poliey makers. 
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and the government in the eurrent aeeountability movement. Darling-Hammond and 
assoeiates, in a ehapter from their reeent book on what teaehers should know and learn 
for effeetive instruetional praetiee, state “there are three important audienees for grades: 
parents, external users sueh as employers and eollege admissions offieers, and students 
themselves” (Shepard et ah, 2005, p.298). Grades are not seen as useful data; teaehers, 
sehool leaders and distriet personnel are absent from the “important audienee” for the 
grades teaehers assign. In stark eontrast to this omission, one of the eentral arguments of 
this study is that not only are grades useful by the sehool in whieh they are ereated, but 
that grades are useful beeause they may be an assessment of both aeademie knowledge 
and how well a student is able to engage in the soeial proeesses of being sehooled. 

While the data presented here should be eonsidered tentative, ealling for 
replieation, the results of this study suggest that when teaehers assign grades, those 
grades are an assessment of two variables: a student’s aeademie knowledge, and a 
student’s ability to negotiate the soeial proeesses of sehool, namely a sueeess at sehool 
faetor (SSF). This was evideneed through the moderate eorrelation of ACT and state 
standardized test seores with eore subjeet grades, but not with non-eore subjeet grades, 
even as eore and non-eore subjeet grades moderately eorrelated with eaeh other. These 
results suggest that these two sets of eorrelations, ACT with grades and eore grades with 
non-eore grades, explain two different varianee struetures in the data. Assuming that the 
ACT and the state standardized test assess aeademie knowledge, then it ean be 
hypothesized that the moderate eorrelation of the ACT with eore-subjeet grades is a 
measure of the aeademie eontent of those subjeets, explaining about 25% of the varianee 
in eore-subjeet grades (eorrelation of about 0.5). However as the ACT did not eorrelate 
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with non-core subject grades, it might be an indieation that those subjeets did not eontain 
the aeademie knowledge that the ACT assesses. This is eorroborated with the state 
standardized test seores also not eorrelating with non-eore subjeet grades. Conversely, 
eore and non-eore subjeet grades moderately eorrelated, indieting a similarity in the 
varianee struetures between eore and non-eore grades that does not exist between grades 
and the two standardized assessments studied. Again, eautioning that these results are 
preliminary it is the hypothesis here that the eorrelation between eore and non-eore 
grades represents a sueeess at sehool faetor. The findings, however may only be speeifie 
for the students studied and thus have little external validity. Also, the eorrelation 
evidenee should be eonsidered relatively weak due to the small sample sizes and the 
overlapping eonfidenee intervals. 

Additionally, and maybe more importantly, a SSF is also indieated by the K-12 
eluster analysis data presented in ehapter VI. Close inspeetion of the grade eluster 
patterns during early elementary, but even more eonsistently at the middle and high 
sehool levels, shows that student aehievement is generally subjeet independent. Students 
who do well in one subjeet generally do well in all subjeets aeross grade-levels and years 
of sehooling, while students who seore poorly in one subjeet generally seore poorly in all 
subjeets. This eorroborates the eorrelation data on a SSF, a student’s ability to negotiate 
the soeial proeesses of being sehooled sueh as attending elass, partieipating, and being 
well behaved. This is an important variable that teaeher assigned subjeet-speeifie grades 
may assess. If grades are eonsidered a measure of both aeademie aehievement and SSF, 
rather than as a poor assessment of aeademie aehievement alone, then a student’s early 
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drop in grades may signal an important intervention point for distriet and sehool level 
data driven deeision making. 

If one eombines the findings of past researeh on grades with the data presented 
here, the idea of a sueeess at sehool faetor (SSF) is reasonable. If one aeeepts the findings 
from the hodge-podge grading literature, that teaehers use grades to assess mueh more 
than aeademie knowledge, ineluding attendanee, partieipation, homework eompletion, 
behavior, and extra eredit assignments, then these faetors that have been east in a 
pejorative light in the past researeh literature on assessments may instead be useful as 
assessments of all of these faetors eombined, whieh would indieate a student’s ability to 
eonform to teaeher, elassroom and sehool soeial demands for the aet of being sehooled. 
This “hidden eurrieulum” (Braeey, 1994; Wood, 1994) is the hypothesized sueeess at 
sehool faetor (SSF). Additionally, a SSF may also indieate ehallenges faeed by a student 
outside of sehool that influenee that student’s behavior and partieipation within the 
sehool building. These ehallenges may be family and eeonomieally based, sueh as if the 
student’s family begins to undergo a period of high stress, due to the loss or switehing of 
jobs by a student’s parents, parental divoree, or other family strife. These ehallenges may 
also be student eentered, arising from a behavioral or learning disability that had gone 
undeteeted. 

The idea behind a SSF is not new. These issues have been well doeumented, 
showing that from an early age and then throughout the sehooling proeess, a ehild’s 
sueeess at sehool depends on the ehild funetioning well in multiple domains, ineluding 
behavioral, attention, soeial and aeademie (Alexander et ah, 2001; Flanagan et al, 2003; 
Hamre & Pianta, 2001, 2005). All of these faetors eould eontribute to an inerease or 


126 


Bowers, A.J. (2007) 



decrease in a student’s willingness to participate in the social processes of school. Hence, 
the argument here is not for the existence of a SSF. While not articulated before as a 
“Success at School Factor”, the point that multiple social processes must be negotiated to 
succeed at school is well studied and known. The point that grades may be an assessment 
of both academic knowledge and as a student’s ability to negotiate the social processes of 
school has also been well detailed in the past (Parsons, 1959). However this point seems 
to have been lost in the grading literature, as the focus over the past forty years seems to 
have centered on the point that grades do not appear to be very reliable when it comes to 
assessing academic knowledge. This study does not attempt to address why the literature 
has focused so intently on the academic component of grades and ignored the social 
component in the recent literature. What is new on this subject for this study is the 
argument that grades may be an assessment of both academic knowledge and SSF. While 
not a new idea, it is argued here that a SSF should be re-introduced in the discussion of 
grades and the use of grades by researchers and practitioners. This study presents a viable 
way to do so. 

The fact that grades appear to assess SSF is important due to the additional 
findings of this study that show that grade patterns are predictive of on-time graduation. 
Historically, standardized tests have lacked criterion validity measures that have linked 
high standardized test scores with on-time graduation, while grades, and specifically 
extremely low and failing grades, have been shown to correspond with higher rates of 
dropping out (Alexander et ah, 2001; Allensworth, 2005; Allensworth & Easton, 2005; 
Montes & Lehmann, 2004; Wood, 1994). This study has confirmed and extended the 
findings that grades are useful predictors of student graduation. But a tension that exists 
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in the literature is the question as to why grades are predietive of dropping out if grades 
are merely subjeetive hodge-podge measures that do not appear to be eonsistent from 
teaeher to teaeher. The eontention here is that grades are predietive of on-time 
graduation, as well as future grading patterns, beeause grades are an assessment of both 
aeademie knowledge and SSF. If a student has high aeademie aptitude but is unable to 
negotiate these soeial proeesses of sehool sueh as showing up to elass on time, 
partieipating, doing their homework, and generally “playing the game” that is the 
Ameriean sehooling proeess, that student will reeeive a low grade. Those low grades are 
predietive of future low grade trends as well as not graduating on time. This problem is 
eompounded for a student that also laeks the aeademie aptitude or foundational skills in 
reading or mathematies that would also eorrespond to low seores on aeademie 
aehievement tests. 

The Hargris Hypothesis 

In referenee to the seeond researeh question proposed, this study provides 
evidenee that supports the Hargris hypothesis, that early grade patterns prediet future 
grade patterns (Hargis, 1990; Kirsehenbaum et ah, 1971). Results of eluster analysis 
show that early grades in elementary sehool are generally predietive of later grading 
patterns, and, by middle and high sehool, grade patterns are highly predietive of future 
grade patterns. Implieit in the Hargris hypothesis is that the assignment of early grades is 
the cause of later grading patterns. The data presented in this study is ambiguous on this 
issue. Hargris argues from the perspeetive of the teaeher expeetaney literature (Elashoff 
& Snow, 1971; Hargis, 1990; Raudenbush, 1984; Rosenthal & Jaeobsen, 1969; Spitz, 
1999), intimating that teaeher pereeptions of potential student ability in the early 
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elementary grades is one of the main eauses of future student sueeess or failure at sehool, 
and that those pereeptions may be based on a multitude of faetors outside of a student’s 
aetual ability, sueh as family soeioeeonomie status. Studies from the expeetaney 
literature, while mueh eritiqued as diseussed above, eoneluded that early teaeher 
pereeption of student ability, within the first few weeks of first grade, eould be a major 
determinate of future student outeomes. 

Although these data support Hargris’ hypothesis, they do not provide evidenee for 
or against the expeetaney literature’s hypothesized eause of these grade patterns. It may 
be true that teaeher expeetaney is the main driver of student grade patterns and that 
students who reeeive high grades in early elementary sehool are motivated into eyeles of 
higher grading patterns, while students who reeeive low grades due to the self fulfilling 
propheeies of early teaeher defieit thinking beeome loeked into a eyele of low grading 
patterns, whieh is diffieult to eseape from. The data presented here do not provide enough 
evidenee to judge the eause of student grade patterning. 

Nevertheless, while these long-term eonsistent student grade patterns may be due 
to teaeher expeetaney, there is an alternative explanation: teaehers instead may be very 
adept at assessing a student’s ability to negotiate the soeial proeesses of sehool, a sueeess 
at sehool faetor (SSF), from an early age, and grades may be an indieation of that 
assessment. Sinee we know that teaeher pereeptions of grades eonfirm the “hodge-podge” 
grading praetiees, in that grades are an assessment of the multiple soeial norms of 
sehooling, sueh as attendanee and partieipation, then it is reasonable to believe that rather 
than dooming ehildren from an early age to patterns of low or high grades, as the 
expeetaney literature implies, teaehers are aeeurately assessing a student’s ability to 
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negotiate the soeial proeesses of sehool, and that this assessment is refleeted in the grades 
that a student reeeives. A history of high grade patterns may indieate that a student is not 
only aequiring aeademie knowledge, but also eonforming to the soeial norms of the 
sehooling proeess. Conversely a history of low grade patterns may indieate that a student 
is not aequiring aeademie knowledge and may also not be learning how to negotiate the 
soeial norms of sehooling. It ean be imagined that if a student has not learned how 
negotiate the system of sehool, and is not turning in homework, partieipating, or 
attending elass, then that student eould fall quiekly behind in their aeademie work, whieh 
would predispose that student to lower grades and eompounding problems. 

Additionally, Hargris does not address the issue of students who do not eonform 
to the overall grading pattern trends; students who may start with high grades but then 
their grades deerease over time, or students who start with low grades whieh then 
inerease over time. As shown in data presented here, generally student grade patterns are 
either high to high or low to low. For some smaller elusters however, students may start 
elementary sehool with low grades aeross subjeets, and then show improvement over 
time. Other elusters of students start with high grades aeross subjeets, but then eontinue 
to loose ground over time; the high-low and low-high elusters. In eontrast to the Hargris 
hypothesis, the hypothesis presented here of a Sueeess at Sehool Faetor (SSF) is able to 
explain both sets of elusters. For students in the high-high elusters, those students may 
have an aptitude for both aeademie knowledge and SSF, and their grades refleet this. For 
low-low students the opposite may be true. In addition, for some students, the ability to 
perform within the soeial proeess of sehool may be a neeessary skill to aequire before the 
aequisition of aeademie knowledge may take plaee. A logieal eonelusion is that for many 
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students the aequisition of the skills to perform well within the soeial norms and 
requirements of the sehooling proeess is a neeessary step before they are able to 
effieiently aequire aeademie knowledge. 

As opposed to the Hargris hypothesis, a SSF also may explain the low-high and 
high-low elusters. Early elementary teaeher expeetaney does not adequately explain how 
these two elusters of students may exist. If teaehers pereeive early-on that a student will 
get high or low grades, and this expeetaney is a self fulfdling propheey, then students 
starting with high grades and then falling, or students starting with low grades and then 
rising are not well explained. If instead, grades are an assessment of a student’s ability at 
being sehooled, a SSF, then students in a low-high eluster may initially be behind the rest 
of their eohort in learning both the aeademie knowledge and the soeial norms of 
sehooling. However, after a time, students may learn what is expeeted of them and begin 
to eonform to the soeial proeesses of sehool. Additionally, these students may also be 
developmentally behind their eohort, and, with time, may gain the ability to learn the 
aeademie and soeial norm knowledge required to perform at sehool. It is interesting to 
note that the low-high eluster diverges at the seeond grade from the low-low eluster, and 
that few students in this dataset appear to “reeover” in this way after elementary sehool. It 
may be that there is a short window of time in whieh a student may “eateh-up” to peers in 
the eohort. If this does not happen in early elementary sehool, then the numbers of 
ehallenges eontinue to rise for the student, and they remain in the low-low eluster. This 
idea is supported in the dropout literature and will be further diseussed below. 

Conversely, the inverse may be true for the high-low eluster of students, in whieh 
they are progressing well in early elementary sehool, but as they reaeh fourth grade their 
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grades begin to fall. This eould be due to an inerease in the requirements of aeademie 
knowledge, transitioning from memorization to eomprehension in both reading and 
mathematies, sueh as the ehange from “learning to read” to “reading to learn” (NCES, 
2001). Another explanation is the additional requirements for partieipation and behavior 
as the aeademie press of the higher grade-levels inereases and as students begin to enter 
puberty. Interestingly, the early learning literature eoneentrates mueh attention on both 
the seeond and fourth grades, referring to both as assessment points in whieh students 
may have individual issues that ean be helped with individualized edueational plans and 
more speeifie attention to their needs (Kamii & Joseph, 2003; Torgesen, 2002). 

Thus, a Sueeess at Sehool Faetor may be a better explanation for grade patterns 
than the Hargris hypothesis. Rather than early teaeher expeetations of student ability 
influeneing a student’s entire future grading pattern, a student’s ability at negotiating the 
soeial proeesses of sehool eould be a eontributor to a student’s future grade patterns as 
teaehers aeeurately assess a student’s SSF through grades. Rather than easting grades in a 
pejorative light, as mueh of the grading literature has done to date, instead grades may be 
useful as an assessment of a student’s SSF. That assessment is important when 
eonsidering data for deeision making. A point diseussed in more detail below. 
Additionally, the evidenee presented here indieates that a student might have a limited 
time window in early elementary sehool to eateh-up in either aeademie knowledge or 
SSF, and that by the end of elementary sehool, student grade patterns are generally set. 
Students may be too far behind to reasonably expeet them to eateh up to their eohort, 
arguing for early rather than later at-risk intervention strategies. 
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It must again be noted, however, that the data supporting a SSF is tenuous at best. 
These results rest on a small and intaet data sample. These limitations must be taken into 
aeeount when eonsidering the veraeity of the elaims made here for the existenee of a SSF. 

Prediction of Not On Time Graduation (NOTG) 

The final researeh question addressed by this study was the extent to whieh 
student grade patterns are predietive of qualitative student outeomes, sueh as graduating 
or not graduating on time. The results presented in ehapter VI show that by using 
hierarehieal eluster analysis to eluster the patterns of student grades, K-12 subjeet 
speeifie grades are useful in predieting a student’s ehanees of not graduating on time 
(NOTG). This predietion method appears to be eomparable to past predietion methods of 
students at-risk of dropping out and not graduating on time, sueh as the methods reported 
by Gleason and Dynarski (2002). Moreover, while mueh of the “at-risk” literature on 
student dropout predietion has foeused on the high sehool level, and to a mueh lesser 
extent on the middle sehool level (Gleason & Dynarksi, 2002; Montes & Lehmann, 2004; 
Rumberger, 1995), eluster analysis of grades appears to be superior in the aeeuraey of 
predieting students at-risk of not graduating on time at the middle sehool and elementary 
levels (see Table 18). Furthermore, while Gleason and Dynarksi present a “best ease” at- 
risk predietor using regression eomposites (2002), as diseussed in ehapter II, prineipals 
and sehool leaders rarely use regression statisties due to the eomplexity of regression 
ealeulations; the violation of multiple assumptions of regression by using nested, 
dependent and multieolinear distriet-level data; eombined with little interest in estimating 
the mean of the general population when they really want to know what may happen with 
their students within the next few years (Creighton, 2001a). Aeeordingly, sehool leaders 
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rarely create regression composites of multiple student variables to predict a student’s 
risk of dropping out. Instead they use individual variables to identify students as at-risk 
(Montes & Lehmann, 2004). The central point is that the method of cluster analysis of 
grades presented here may be more accurate, applicable, and “user friendly” in predicting 
students at risk of not graduating on time considering that 1) the method is comparable to 
past prediction methods using high school level data, and appears superior at the middle 
and elementary levels; 2) cluster analysis does not have the assumption violation issues 
of regression analysis of multicolinearity, dependency of cases, and nested levels of data 
and instead is made more robust when the data contains such underlying structure; 3) is 
applicable to entire cohort, school and district datasets rather than random samples; 4) 
uses grade data that is currently collected on students rather than requiring additional 
outside assessments; 5) and employs the use of grades, which have face validity for 
teachers and parents, are collected from the earliest grade-levels, and have the potential to 
indicate specific subjects and grade-levels for possible intervention. In sum, cluster 
analysis appears to provide an advance over current at-risk prediction methods. It could 
be used for data driven decision making by school leaders to help direct the limited 
resources of a school district in service to students who may be experiencing challenges 
in school and deserve intervention. 

The overriding theme of this study is that grades are useful and predictive as 
assessments of student progress. It is not a novel idea that a student’s ability to negotiate 
the social processes of school matters for that student’s eventual life outcomes, such as 
on-time graduation. 
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It has been well doeumented in multiple studies that a student’s risk of dropping 
out of sehool is not attributable to a single event, but rather appears to be a long-term 
proeess through whieh the aeeumulation of multiple ehallenges over time in a student’s 
sehooling eareer eontinually build-up, eulminating in a student’s deeision to drop out 
(Bryk & Thum, 1989; Christenson & Thurlow, 2004; Delgado-Gaitan, 1988; Ensminger 
& Slusareiek, 1992; Gleason & Dynarksi, 2002; Gutman et al, 2003; Emerson et ah, 
2000; Randolph & Orthner, 2006; Rumberger, 1995). This proeess has been termed a 
“dynamie” or “life-eourse” proeess (Alexander et ah, 2001; Emerson et ah, 2000). 
Furthermore, in referenee to sehool dropouts, it has been shown that soeial eapital, soeial 
support, and emotional support of students at all levels of sehooling is important for 
helping students gain the skills neeessary to sueeeed in sehool, and that rather than being 
foeused on aeademie knowledge, mueh of this need for soeial support is eentered on 
helping students eonneet with the soeial proeesses of sehool in an effort to minimize a 
student’s risk of not graduating on time, and helping them to “play the game” and follow 
the rules of sehooling (Barker, 2005; Croninger & Lee, 2001; Delgado-Gaitan, 1988; 
Hamre & Pianta, 2005; Knesting & Waldron, 2006; Miller, 2005; Zvoeh, 2006). 

These soeial faetors all relate to the Sueeess at Sehool Faetor hypothesized in this 
study, and help to support the idea that the sueeessful negotiation of the soeial proeesses 
of being sehooled is an important eomponent in student’s lives, sinee it may lead to 
greater partieipation in sehool, an inerease in general aeademie aehievement, and a higher 
probability of graduating on time. Moreover, a eontention of this study is that an 
assessment of SSF appears to be a eomponent of grades, and that grade patterns when 
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examined through eluster analysis, are useful in helping to determine students at-risk of 
not graduating on time, and is an advanee over eurrent methods of at-risk predietion. 

While this study presents a novel method and use of teaeher assigned, subjeet 
speeifie grades in predieting students at-risk of not graduating on time, it does not address 
the issue of what should be done onee students are identified. While outside the seope of 
this study, it is important to address this question sinee aeeurate identifieation is only the 
first step of many in helping to address the needs of students who may be experieneing 
diffieulties with sehool. However, to date, little work has been done to systematieally 
evaluate at-risk prevention programs. 

For most of the evidenee, methodologieal problems persist whieh inhibit a robust 
evaluation of what works, sueh as biased groupings and estimates of effeets, sinee 
randomized eontrolled trials are rarely performed in this area (Agodini & Dynarksi, 2004; 
Lehr et ah, 2003). Nevertheless, what the literature indieates is that historieally, most 
dropout prevention programs appear to not reduee student dropouts (Dynarski & Gleason, 
2002). As reviewed by Dynarski and Gleason (2002) and Lehr et al (2003), these 
programs mostly oeeur at the high sehool level and eonsist of helping students build self- 
esteem, overeome personal and family issues and inerease attendanee through periodie 
eounseling; eonsist of the ereation of smaller sehool settings; or provide tutoring or 
mentoring serviees. Similar programs at the middle sehool level have had somewhat 
more of an impaet, but as diseussed above, the aeeuraey of identifieation of students at 
risk of dropping out using middle sehool level data has been low and problematie to date. 
Henee, any program that appears to work using middle sehool level data, may have 
“worked” only to the extent that the majority of the students identified for at-risk 
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interventions were mis-identified originally as being students at risk of dropping out. 

Aeknowledging that mueh more high-quality work is needed in the evaluation of 
dropout prevention programs before any one individual program ean be reeommended 
over another (Dynarski & Gleason, 2002; Lehr et ah, 2003), reeent literature has begun to 
urge for a shift from a defied model of attempting to prevent dropouts, to a more positive 
model of promoting and eneouraging sueeessful sehool eompletion (Christenson & 
Thurlow, 2004). From the perspeetive of the results presented here in ehapter VI, dropout 
prevention programs should foeus more on the earlier grade levels, rather than almost 
exelusively at the high sehool level. To this end, a reeent study showed that first-grade 
students with known eharaeteristies of sehool dropout taught in elassrooms with multiple 
dimensions of support (ineluding behavioral, attention, aeademie and soeial) inereased 
seored higher on aeademie and soeial aehievement seales, than eomparable ehildren who 
attended elassrooms with less supportive environments (Hamre & Pianta, 2005). 
Interventions sueh as this, whieh provide early assistanee for both the aeademie and 
soeial needs of ehildren, provide an attraetive future avenue for intervention studies and 
for distriet strategies to help students learn and ultimately graduate on time. 

Cluster Analysis of Subject-Specific Teacher Assigned Grades 

Chapter VI presents a novel applieation of eluster analysis for the study and use of 
subjeet-speeifie teaeher assigned grades. Patterns of longitudinal student grades appear to 
be predietive of future student grades and qualitative outeomes, sueh as on-time 
graduation. In addition, speeifie sub-elusters of student grade patterns suggest early 
intervention points in student’s eareers in sehools for students who may be at risk of low 
sehool performanee and eventually not graduating on-time. 
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Cluster analysis has been rarely used in edueation to date. This may be due to the 
pereeption that eluster analysis, a deseriptive multivariate statistie, is a less sophistieated 
proeedure for statistieians than other multivariate statisties sueh as linear regression, due 
to the point that signifieanee tests and eonfidenee intervals are not readily applieable to 
eluster analysis (Lorr, 1983; Romesburg, 1984). However, if one eoneeives of sehool and 
distriet-level data as large and historieally untapped databases that are highly 
multieolinear, interdependent, and nested, then eluster analysis as a data mining 
proeedure beeomes more attraetive for researehers and praetitioners faeed with the 
avalanehe of student-level data now eolleeted on students at every level. The 
attraetiveness of eluster analysis rises further when eonsidering that these same datasets 
are problematie for use in regression statisties, due to these same issues with 
multieolinearity and dependenee of eases. Student data is messy and eomplex. Cluster 
analysis ean bring order and strueture to that data, revealing previously unknown patterns 
in an effort to help drive deeision making based on that data. 

Combining eluster analysis with an Eisenplot in the analysis of edueational data, 
as detailed in the methods and ehapter VI, is also a novel applieation of this method. The 
majority of quantitative methods rely on aggregation of data to the mean, and the 
reporting of a generalized trend. For large seale studies that wish to estimate the mean of 
a population of students in a state or a nation, generalization to the mean is desired. 
However, at the sehool and distriet level, redueing data trends to the mean neeessarily 
requires the loss of information and an inerease in the theoretieal “distanee” between the 
generalized trends and the individuals for whom the data eould be used for deeision 
making (Hayman et al, 1979). To tease out overall patterns and trends, this loss of data 
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in exchange for an overall mean has been deemed as acceptable in the literature, and 
newer statistical procedures have been created to recover or control for specific trends in 
data through deviation from the mean in many high-level statistical procedures, including 
all forms of regression analysis. However, these procedures all come with additional 
issues of controlling for assumption violations in a dataset. Nevertheless, the fine 
granularity of a dataset is lost with traditional inferential statistics as the statistics 
aggregate to the mean. This becomes extremely important when considering the use of 
this data for decision making for individual schools and districts. For a large enough 
dataset, each individual’s data in any situation should theoretically be unique. Hence, if 
decisions are to be made about individuals based on their data, especially high stakes 
decisions in settings such as education, decisions should be based on the entire pattern of 
data of an individual to date in the system leaving the individual data points intact and 
available for review and alternative analysis. 

At the other end of the spectrum, far removed from aggregating all data to the 
mean, is the practice of relying on individual data points to make high-stakes decisions, 
often witnessed in education as students are assigned to at-risk pull-out programs, 
retention, or remedial services based on one, or just a few data points, such as a single 
grade, test, or categorical variable (Cobum & Talbert, 2006; Creighton, 2001a). The 
logical middle-ground between these two extremes of generalizing to the mean or basing 
decisions on individual datapoints, is to acknowledge the qualitative literature and strive 
to produce deep and rich datasets that begin to bring together the best of quantitative and 
qualitative theories of knowing, bridging the divide and blurring the lines between the 
two. Finding a middle ground between quantitative generalizable statistical findings and 
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qualitative context-localized deep descriptions has been much discussed in the literature 
(Madey, 1982; Onwuegbuzie & Leech, 2005; Shaffer & Serlin, 2004). Cluster analysis 
with the inclusion of an Eisenplot is a start down this path from the quantitative side. 

While the cluster analysis procedure detailed here does use means and 
correlations, it begins to bridge these divides between the loss of data to the mean versus 
examination of single data points, as well as the divide between quantitative generalized 
data and qualitative context-localized data, in four main ways. First, cluster analysis 
employs the use of numerical datasets, and is thus considered a quantitative method 
(Lorr, 1983; Rencher, 2002; Romesburg, 1984). Second, cluster analysis preserves the 
entire list of all cases, rather than aggregating all cases into a single mean, reordering a 
list and giving it a taxonomic structure that places each case proximal to other similar 
cases in the list based each case’s data pattern. For schools, rather than aggregating 
achievement data to a mean for the school or district, cluster analysis preserves the list of 
cases that would go into such a mean, and gives the order of the list meaning. Third, a 
dendrogram, or cluster tree, allows one to visualize the organization of clusters and 
magnitude of similarity between clusters, revealing more about each case rather than less. 
Fourth, with the inclusion of an Fisenplot, every datapoint for each case for each variable 
is displayed in a context based on the similarity of each case’s data pattern to each other 
case’s data pattern in the dataset. This provides for a deep and rich display which makes 
obvious and disaggregates every datapoint used in the cluster analysis, revealing and 
maintaining the data of each individual while allowing for pattern recognition. In this 
way, cluster analysis is a quantitative method that employs some of the aspects of a 
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qualitative method in ereating a deeper and thieker deseription. It is a visual and intuitive 
new data analysis tool for edueators. 

Cluster analysis requires the standardization, or z-seoring, of all variables within a 
dataset, to eompensate for the overweighting in the dataspaee that one variable may have 
that would distort the elustering patterns (Lorr, 1983; Romesburg, 1984). Z-seoring 
allows then for a more “apples to apples” eomparison. Moreover, in elustering grade data 
for this study, the neeessity to z-seore all of the data provided unexpeeted benefits. First, 
beeause eaeh subjeet-speeifie grade variable is normalized through z-seoring, grade 
inflation is eontrolled for. Seeond, grade data has rarely been z-seored in the literature, 
however z-seoring of grades, espeeially at the early elementary stage, may be an 
important innovation. If one eonsiders that it has been shown that low or failing grades at 
the high sehool level are predietive of student dropout (Alexander et ah, 2001; 
Allensworth, 2005; Allensworth & Easton, 2005), but that the distribution of grades may 
be narrower and skewed towards higher grades in early elementary sehool (no students 
failed any subjeet at any grade-level before 6* grade in the dataset presented here) then 
examining grades as a z-seored distribution rather than as a fixed seale is important. As 
shown in ehapter VI, low grades as early as first and seeond grade are generally 
predietive of future student grade patterns. However, sinee the grades are z-seored, “low” 
is relative to the distribution of eaeh subjeet-speeifie and grade-level variable. This 
results in the ability of eluster analysis to reveal “low grading” patterns that would not be 
readily apparent if the grades were elustered based on the 4-point grading seale, sueh as 
those in the lower eluster of Figure 19, in whieh the low grades at the early elementary 
level may only be as low as a B or a C. As shown in Figure 25 in the examination of the 
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four clusters of high-high, high-low, low-high and low-low, a small change in overall 
grades at the elementary level, from a B+ to B, is significant in predicting the future 
outcomes of the students in the cluster, and most likely has gone unexamined in the past. 
Z-scoring of grades has resulted in the conclusion that not only does it appear that 
students with low grades in early elementary school appear to eontinue to reeeive low 
grades throughout the rest of their eareer in a distriet, but that “low grades” should be 
defined as students who skew more towards a -1 standard deviation aeross a standardized 
graded-subjeet variable, rather than on absolute grades, sueh as a C, D or F. 

Cluster analysis in eombination with an Eisenplot provides an additional 
innovation when examining longitudinal subjeet-speeifie grades of providing a visual 
method to assess the eourse enrollment patterns of all students aeross a dataset. In the 
past, a method has not been readily available whieh would allow for school leaders to 
disaggregate and examine all of the eourse taking patterns of all of their students, and 
eompare the differenees in those patterns between students who appear to sueeeed in 
sehool and those who do not. The method presented here allows one to do just that. As 
mentioned in the presentation of the results for Figure 19 in ehapter VI, by examining the 
patterns of column s of data at the high sehool level for students in the upper eluster in 
eomparison to the lower eluster, students in the lower eluster appear to take fewer elasses 
overall than students in the higher eluster, and they are awarded lower grades for those 
fewer eourses. Additionally, beeause eaeh grade-level in the eluster analysis eontains a 
repeating order of subjeets from left to right (from eore subjeets sueh as mathematies, 
English and seience, to more non-eore subjeets, sueh as band, physieal edueation and art) 
“column s ” of contiguous data patterns running vertieally ean be identified for students in 
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the upper eluster throughout high sehool, beginning to dissolve into more non-eore 
subjeets only by grade 12. Conversely, for students in the lower eluster, not only is it 
evident by the eolor-bloek pattern that they are taking fewer eourses than the upper 
eluster, but beeause there is more “seatter” in their patterns aeross the repeating pattern of 
subjeets, the students in the lower eluster enrolled in many more non-eore eourses than 
students in the upper eluster. This is a signifieant finding eonsidering the researeh to date 
on the higher rate of sueeess, graduation and eollege attendanee for students who take 
eore eourses throughout their eareers in high sehool (Adelman, 1999; Ayalon, 2006; 
Gamoran & Hannigan, 2000; Girotto & Peterson, 1999; Meyer, 1999; Trusty, 2002; 
Woods, 1995). 

Grades and Data Driven Decision Making - Conclusion 

This study has shown evidence that grades are useful when considering data 
driven decision making, that grades and standardized assessments may be converging 
over time for one of the two districts, and that cluster analysis is a new and useful method 
for analyzing patterns of student data to predict future outcomes. This analysis may be an 
advance over past practices that is more useful, has fewer assumption violations, and has 
more face validity than past methods. The literature to date on data driven decision 
making indicates that when teachers and school leaders collaborate around student-level 
data with a focus on improvement of educational practice, the process of open 
communication, dialogue and a focus on student’s performance to date in the system is 
helpful in encouraging school success and an increase in professional collaboration 
amongst the staff (Bernhardt, 2004; Cobum & Talbert, 2006; Halverson et al, 2005; Kerr 
et al, 2006; Thom, 2002; Wayman & Stringfield, 2006a, 2006b; V. M. Young, 2006). 
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The contention of this study is that grades, and analysis of grades through cluster 
analysis, may be useful for data driven decision making in schools and school districts. 
Rather than have schools make two copies of a report card and send one home to the 
parents and one into storage in the basement, school leaders should bring that data back 
up out of the basement and put it to use for data driven decision making for multiple 
reasons. First, grades are already generated as part of the system, and so in a way they 
could be seen as “free” or low cost, especially in comparison to the current movement in 
many districts across the nation discussed in chapter II to add increasing levels of 
periodic assessments to help predict state assessments, spending both money and 
instructional time on what may be unnecessary additional test preparation. Second, 
grades appear to be predictive of future student grade patterns and on-time graduation, 
and for one district in this study, the correlation between grades and a standardized 
assessment may be rising over time. Hence, rather than ignore the grading system, which 
schools already devote enormous amounts of time to generating, that data can be used 
more efficiently by including it in the data driven decision making process, to analyze the 
performance of each student, predict future performance, and help direct the limited 
resources of a school district to students who could most benefit. Third, the method 
presented here utilizes data that schools already possess, and mirrors what could be 
considered a “typical” district dataset. Fourth, because this method uses data that is 
already present in every school district, the two largest hurdles to practitioners using 
grades and cluster analysis are the extensive amount of effort required to input grades 
into an electronic database and teaching practitioners how to conduct and read cluster 
analysis and Eisenplot outputs. With the continuing increase in district use of electronic 
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databases to store all of their data (Streifer, 2002; Wayman, 2005; Wayman & 

Stringfield, 2006b; Wayman et ah, 2004), the issue of transferring grade data from paper 
report eards into an eleetronie database disappears. Additionally, sinee eluster analysis 
requires mueh less attention to the traditionally problematie issues of multieolinarity and 
ease dependenee of regression analysis, analysis by distriet leaders using elustering and 
Eisenplots should be less diffieult than most other statistieal analyses onee they are 
trained on how to read a eluster tree and an Eisenplot. And fifth, sinee grades have faee 
validity with teaehers, parents and students, using grades in addition with standardized 
tests for data driven deeision making may help inerease buy-in on data-based deeisions 
from these multiple stakeholders. 

The final point is that analysis of long-term grading patterns should be eonsidered 
the job of sehool leaders and distriet eentral offiee staff, not teaehers, thus making this 
analysis an administrative and leadership issue. A teaeher’s day is already full, and 
adding the requirement to eluster or pattern their students by elassroom would add 
unneeessary work. Additionally, a teaeher must be eoneerned with her entire elass and 
the near-term needs of all of her students, working to improve daily instruetion for 
tomorrow. It is the job of sehool and distriet administrators to provide the data analysis 
for teaehers so that they may see the eonneetions in their praetiee throughout a sehool 
system and how eaeh teaeher’s praetiee influenees the outeomes for students over time. 
To keep from burdening teaehers with an ever inereasing array of responsibilities, sehool 
leaders must provide the finished analysis for diseussion, rather than requiring teaehers to 
perform the analysis themselves. Also, sinee the addition of data only inereases the 
granularity of elusters within a elustered dataset, it would be unreasonable to ask 
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individual teachers, or even individual schools to cluster their data. Rather, districts 
should cluster all of their data to create the largest and most robust dataset available. The 
methods detailed here provide a means to focus in on the long-term district-wide trends 
of student grade patterns, and at the same time pinpoint specific time and data points for 
interventions. If a timepoint, subject, grade level, or specific cluster of students is 
identified as in need of assistance, that assistance would take political power and 
financial backing to implement given the limited resources of a school district. Only 
central office staff and school leaders have such power, as well as a school and district- 
wide vision that could be enhanced through the addition of grades and cluster analysis to 
ongoing efforts at data driven decision making. 


146 


Bowers, A.J. (2007) 



APPENDICES 


147 


Bowers, AJ. (2007) 



APPENDIX A 


Table 19: Course names and percentages of students who attended each specific course 
for each subject grouping during 10th grade semester 2, full dataset. 
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Mathematics 
Class Name 

%of 

enrolled 

students 

English 
Class Name 

%of 

enrolled 

students 

Science 
Class Name 

%of 

enrolled 

students 

Not Enrolled 

24.7% 

English 10 Taken 

28.3% 

Biology 

36.8% 

Int Math II 

12.2% 

Not Enrolled 

23.9% 

Not Enrolled 

25.2% 

Geometry 

7.8% 

Lang Arts II 

19.7% 

Earth Sci 

1 1 .9% 

Math II 

7.8% 

English II R 

5.3% 

Env Sci 

8.0% 

Math 1 

6.9% 

Hon Eng 10 

4.4% 

Earth Science 

5.0% 

Algebra B 

5.3% 

English 10 Honors 

4.2% 

Practical Biology 

4.2% 

Pre Alg 

5.3% 

English II H 

3.1% 

Life Sci 

1.1% 

Algebra 1 

3.3% 

English 9 

1 .7% 

Anat/Phys 

0.8% 

Algebra II 

3.1% 

Lang Arts III 

1 .4% 

Physical Science 

0.8% 

App Math II 

2.8% 

Eng 10 

0.8% 

Life Science 

0.6% 

Shop Math 

2.5% 

English II 

0.6% 

Basic Chemistry 

0.3% 

Algebra A 

2.2% 

ESL - English 3 

0.6% 

Basic Earth Science 

0.3% 

Int Math ll-H 

1 .9% 

lEP English 

0.6% 

Basic Human Science 

0.3% 

App Math 1 

1 .4% 

Lang Arts III H 

0.6% 

Biology 1 

0.3% 

Basic Geometry 

1.1% 

Basic English 

0.3% 

Biology 2 

0.3% 

Int Math III 

1.1% 

Basic English 2 

0.3% 

Chem in the communit 0.3% 

App Math III 

0.8% 

British Lit 

0.3% 

Chemistry 

0.3% 

ESL - Math 

0.8% 

Composition 

0.3% 

ESL Science 

0.3% 

Math 

0.8% 

English 

0.3% 

Fd Biology 

0.3% 

Algebra 1 

0.6% 

English 10B 

0.3% 

Gen Bio Sci 

0.3% 

Applied Math 

0.6% 

English 2S 

0.3% 

Gen Sci 

0.3% 

lEP Math 

0.6% 

English Lang Studies 0.3% 

General Science 1 

0.3% 

Int Math lll-H 

0.6% 

English Skills 

0.3% 

Hon Eng 9 

0.3% 

Pre-Algebra 

0.6% 

English V 

0.3% 

Phy/Earth Science 

0.3% 

Alg 2nd half 

0.3% 

ESL - English 2 

0.3% 

Phyiscal Science 

0.3% 

Alg Essentials II 

0.3% 

ESL - English A 

0.3% 

Phys/Earth Science 

0.3% 

Alg I 

0.3% 

lEP Eng 

0.3% 

Physical/Earth Sci 

0.3% 

Alg I I/Trig 

0.3% 

IL Lit 10 

0.3% 

Physics 

0.3% 

Algebra 

0.3% 

Lang Arts 1 

0.3% 

Sci Concept 2 

0.3% 

Algebra 1-2 

0.3% 

Language Arts 10 

0.3% 

Y Science 2A &B 

0.3% 

Basic Geom 

0.3% 

Look at Lit 

0.3% 



Cons Math 

0.3% 

Y English 2A &B 

0.3% 



Cons. Math 

0.3% 





Consumer Math 

0.3% 





E-Basic Math 

0.3% 





General Math 

0.3% 





lEP Math II 

0.3% 





Int Math 

0.3% 





Int Math I 

0.3% 





Int Math-H 

0.3% 





Integ Math 1 

0.3% 





Math 9 

0.3% 





Res Math 

0.3% 
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%of 




%of 

Foreign Language 

enrolled 

Social Studies 

% of enrolled 

Government 

enrolled 

Class Name 

students 

Class Name 

students 

Class Name 

students 

Not Enrolled 

79.4% 

US Flistory 

44.0% 

Not Enrolled 

94.7% 

Spanish II 

1 0.8% 

Not Enrolled 

29.0% 

Gov/Cons Econ 

2.5% 

Spanish 1 

5.3% 

WId Hist 

18.1% 

Government 

1 .7% 

Spanish 2/3 

1 .4% 

State History 

2.5% 

Am Government 

0.3% 

Spanish 1 

0.8% 

Psychology 

1 .4% 

American Govt 

0.3% 

Spanish III 

0.6% 

ESL US History 

0.6% 

Geography/Econ/CivicO.3% 

French 1 

0.3% 

World History 

0.6% 

Pratical Law/Econ 

0.3% 

French I 

0.3% 

Am Wrid Studies 

0.3% 



French II 

0.3% 

Amer. History 

0.3% 



German I 

0.3% 

Cont Global 

0.3% 



Spanish I H 

0.3% 

Contempt WWId 

0.3% 



Spanish l-FI 

0.3% 

ESL History 

0.3% 





ESL U.S. History 

0.3% 





Global Issues 

0.3% 





History 

0.3% 





lEP Hist 

0.3% 





lEP US Hstory 

0.3% 





Mod World History 

0.3% 





Soc Studies 

0.3% 





Sociology 
Y-Geography 2A & 

0.3% 





IB 

0.3% 




%0f 




%of 

Economics 

enrolled 

Band 

% of enrolled 

Physical Education 

enrolled 

Class Name 

students 

Class Name 

students 

Class Name 

students 

Not Enrolled 

93.9% 

Not Enrolled 

84.4% 

Not Enrolled 

67.2% 

Intro to Bus 

1 .4% 

Band 

5.3% 

Strg/Cond 

7.8% 

Economics 

1.1% 

Band-HS 

2.8% 

Health 

7.2% 

Acct 

0.8% 

Concert Choir 1 

1 .4% 

P.E. 

3.9% 

Accounting 

0.6% 

Jazz Band 

1 .4% 

Phys Ed 

3.6% 

Prin of Mkting 

0.6% 

Choir-HS 

1.1% 

Adv PE 

3.3% 

Retailing 

0.6% 

Symphonic Band 

1.1% 

Advanced PE 

1.1% 

Accounting I 

0.3% 

Concert Choir 

0.8% 

PE 

1.1% 

Accounting III & IV 

0.3% 

Choir 

0.6% 

Physical Edu 

1.1% 

Bits Business and fi 

0.3% 

Cone Choir 

0.3% 

Team Sports 

1.1% 

Intro Bus 

0.3% 

Concert 

0.3% 

PE Swim 

0.6% 



GHS Singers 

0.3% 

ADV PE 

0.3% 



Men's Choir 

0.3% 

Advaned PE 

0.3% 





Dance Fitness 

0.3% 





Life Rec Sports 

0.3% 





PE 10 

0.3% 





Phys Educ 

0.3% 





Team Sports/Health 

0.3% 
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%of 


%of 


%of 

Computers 

enrolled 

Life Skills 

enrolled 

Art 

enrolled 

Class Name 

students 

Class Name 

students 

Class Name 

students 

Not Enrolled 

75.0% 

Not Enrolled 

76.3% 

Not Enrolled 

82.4% 

Bus Tech 1 

8.9% 

Mach Woods 

5.4% 

Art 2/3 

4.2% 

Computer App 

4.4% 

Arch Draw 

2.9% 

Art Foundations 4.2% 

Comp App 

2.8% 

Typing 

2.6% 

Art 

1 .7% 

Computer2/Careers 2 1.9% 

Mech Draw 

2.3% 

Art 11 

1 .7% 

Bus Tech II 

1 .7% 

Bench Woods 

1.1% 

Illustration 1 

1.1% 

Comp Multimedia 

1 .4% 

Gen Metals 

1.1% 

Acting Theater 

0.8% 

Computerl /Careers 1 

1.1% 

NTR/WP 

1.1% 

Art 1 

0.8% 

Bits ATC 

0.3% 

Gormet Food 

0.9% 

Studio Art 

0.6% 

Cadet Media 

0.3% 

Personal Living 

0.6% 

Acting Theatre 

0.3% 

Com App 

0.3% 

Woodshop 1 

0.6% 

Art Studio 

0.3% 

Com Multimedia 

0.3% 

Adv Mech Drawing 

0.3% 

B. Theater 

0.3% 

Comp/Multimedia 

0.3% 

Arch Drawing 

0.3% 

Comp/Art Skills 0.3% 

Computer Apps 

0.3% 

Basic Foods 

0.3% 

Drama 

0.3% 

Computer Pro 

0.3% 

Basic Skills 

0.3% 

Draw 1 

0.3% 

Computerl /Careersi 

0.3% 

Bench Metal 

0.3% 

Illustration 1 H 

0.3% 

Health 

0.3% 

Cadet Media 

0.3% 

Intro to Art 

0.3% 

Intro Info Processin 

0.3% 

Cadet Teacher 

0.3% 

Theater 

0.3% 



Communication Arts 

0.3% 





Coop Voc 

0.3% 





Drafting 

0.3% 





Driver Ed 

0.3% 





Human Resources Adml 0.3% 





Ind Living Skills 

0.3% 





Keyboarding 

0.3% 





Keyboarding 1 

0.3% 





Mech Drawing 

0.3% 





Retailing 

0.3% 





Study Skills 

0.3% 
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APPENDIX B 


Fisher confidence intervals were calculated for each correlation of grades to ACT 
in Figures 14 and 16 through 18. Confidence intervals were calculated as previously 
described (Howell, 2002). Fisher r to z transformations were estimated using equation 8 
and back transformed using equation 9. 

Z. =^[logdl + ^)-logdl-'-)] 

Equation 8 
Equation 9 

Pairs of confidence intervals (lower and upper) are listed in Table 20 for each correlation 
in Figures 14, and 16 through 18. All of the confidence intervals overlap, indicating little 
to no statistical difference between the correlations. 
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Table 20: Fisher confidence intervals for correlation comparisons 


COMP MATH ENG READ SCI 



Lower 

Upper 

Lower 

Upper 

Lower 

Upper 

Lower 

Upper 

Lower 

Upper 

West Oak 

HSGPA 1994 

0.165 

0.835 

-0.025 

0.800 

0.464 

0.915 

0.145 

0.833 

0.140 

0.830 

HSGPA 2006 

0.385 

0.800 

0.435 

0.825 

0.175 

0.675 

0.340 

0.785 

0.190 

0.715 

South Pine 

HSGPA 1994 

0.295 

0.770 

0.275 

0.760 

0.007 

0.620 

0.145 

0.700 

0.315 

0.780 

HSGPA 2006 

0.390 

0.785 

0.240 

0.715 

0.315 

0.755 

0.375 

0.780 

0.200 

0.695 

West Oak 1994 

Math HS GPA 

0.075 

0.809 

-0.125 

0.727 

0.265 

0.867 

-0.087 

0.744 

0.058 

0.802 

Eng HS GPA 

0.206 

0.850 

-0.223 

0.676 

0.540 

0.929 

0.282 

0.871 

-0.042 

0.764 

Sci HS GPA 

0.206 

0.850 

-0.103 

0.737 

0.440 

0.909 

0.119 

0.823 

0.057 

0.802 

West Oak 2006 

Math HS GPA 

0.130 

0.677 

0.272 

0.755 

0.019 

0.619 

-0.021 

0.594 

-0.051 

0.574 

Eng HS GPA 

0.334 

0.778 

0.311 

0.772 

0.086 

0.659 

0.247 

0.743 

0.262 

0.749 

Sci HS GPA 

0.235 

0.732 

0.312 

0.773 

-0.014 

0.598 

0.203 

0.721 

0.156 

0.697 

South Pine 1994 

Math HS GPA 

0.111 

0.679 

0.246 

0.747 

-0.123 

0.531 

0.012 

0.622 

0.100 

0.673 

Eng HS GPA 

0.250 

0.749 

0.140 

0.694 

0.111 

0.679 

0.161 

0.706 

0.165 

0.708 

Sci HS GPA 

0.246 

0.747 

0.317 

0.779 

-0.074 

0.566 

0.087 

0.666 

0.419 

0.822 

South Pine 2006 

Math HS GPA 

0.473 

0.824 

0.455 

0.816 

0.291 

0.741 

0.403 

0.793 

0.323 

0.757 

Eng HS GPA 

0.339 

0.764 

0.156 

0.670 

0.328 

0.759 

0.319 

0.755 

0.091 

0.632 

Sci HS GPA 

0.432 

0.806 

0.361 

0.774 

0.324 

0.757 

0.333 

0.761 

0.283 

0.737 

West Oak 1994 

Math 10 S2 

-0.251 

0.704 

-0.216 

0.722 

-0.119 

0.766 

-0.281 

0.687 

-0.452 

0.568 

Eng 10 S2 

0.107 

0.819 

-0.334 

0.605 

0.231 

0.857 

0.194 

0.847 

-0.106 

0.736 

Sci 10 S2 

0.246 

0.871 

-0.114 

0.750 

0.175 

0.852 

0.125 

0.837 

0.000 

0.796 

West Oak 2006 

Math 10 S2 

-0.063 

0.566 

0.193 

0.722 

-0.085 

0.559 

-0.193 

0.478 

-0.206 

0.467 

Eng 10 S2 

0.181 

0.710 

0.027 

0.630 

-0.108 

0.542 

0.202 

0.726 

0.107 

0.677 

Sci 10 S2 

-0.054 

0.571 

0.069 

0.655 

-0.255 

0.426 

-0.030 

0.595 

-0.111 

0.540 

South Pine 1994 

Math 10 S2 

0.012 

0.622 

0.179 

0.715 

-0.076 

0.564 

-0.205 

0.468 

-0.054 

0.579 

Eng 10 S2 

0.189 

0.719 

0.057 

0.649 

0.003 

0.616 

0.119 

0.684 

0.013 

0.622 

Sci 10 S2 

0.135 

0.698 

0.261 

0.760 

-0.018 

0.611 

0.011 

0.629 

0.025 

0.637 

South Pine 2006 

Math 10 S2 

0.254 

0.723 

0.319 

0.755 

0.177 

0.681 

0.190 

0.689 

0.085 

0.628 

Eng 10 S2 

0.498 

0.834 

0.253 

0.722 

0.507 

0.837 

0.403 

0.793 

0.287 

0.739 

Sci 10 S2 

0.484 

0.828 

0.350 

0.769 

0.460 

0.818 

0.389 

0.787 

0.310 

0.750 
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Figure 27: Eisenplot of hierarchical clustering of teacher assigned subject-specific 
grades, K-6. 


Cluster analysis of student grades (following page) indieates that for the K-6 dataset, 
student grade patterns eluster into two main elusters, those who eventually graduate on- 
time, and a high pereentage of students who do not graduate on time (NOTG). Eaeh 
student is aligned along the vertieal axis, with subjeets by grade-level aligned along the 
horizontal axis. This figure is presented in color. Z-seored student grades are 
represented by a heat map, with higher grades indieated by an inereasing intensity of red, 
lower grades indieated by inereasing intensity of blue, the mean indieated by grey, and 
white indieates no data (eenter). Hierarehieal elusters are represented by a dendrogram 
(left), with a seale in standard deviation units for the elusters aeross the hyperdimensional 
dataspaee (bottom left). The diehotomous eategorieal variables of NOTG is represented 
by blaek bars (right). The dashed green line through the eenter heat map indieates the 
division line between the two major elusters in the K-6 dataset (eenter). Sehool and 
grade-level is indieated along the top horizontal axis (eenter top). Grade level inereases 
left to right, starting with Kindergarten (K), then Elementary ineludes grades 1, 2, 3, 4, 5, 
and 6. Within eaeh grade-level, subjeets are listed in a repeating pattern as follows: K - 
mathematies, speaking, writing, reading; Elementary - - mathematies, reading, 

writing, spelling, handwriting, seienee, soeial studies; 6* - reading, mathematies, 

English, seienee, band, soeial studies, physieal edueation, art. 
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Figure 28: Eisenplot of hierarchical clustering of teacher assigned subject-specific 
grades, K-1. 


Cluster analysis of student grades (following page) indieates that for the K-1 dataset, 
student grade patterns eluster into two main elusters, those who eventually graduate on- 
time, and a high pereentage of students who do not graduate on time (NOTG). Eaeh 
student is aligned along the vertieal axis, with subjeets by grade-level aligned along the 
horizontal axis. This figure is presented in color. Z-seored student grades are 
represented by a heat map, with higher grades indieated by an inereasing intensity of red, 
lower grades indieated by inereasing intensity of blue, the mean indieated by grey, and 
white indieates no data (eenter). Hierarehieal elusters are represented by a dendrogram 
(left), with a seale in standard deviation units for the elusters aeross the hyperdimensional 
dataspaee (bottom left). The diehotomous eategorieal variables of NOTG is represented 
by blaek bars (right). The dashed green line through the eenter heat map indieates the 
division line between the two major elusters in the K-1 dataset (eenter). Sehool and 
grade-level is indieated along the top horizontal axis (eenter top). Grade level inereases 
left to right, starting with Kindergarten (K), then Elementary ineludes grade 1 . Within 
eaeh grade-level, subjeets are listed in a repeating pattern as follows: K - mathematies, 
speaking, writing, reading; Elementary - E*: mathematies, reading, writing, spelling, 
handwriting, seienee, soeial studies. 
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